Answering your questions:
The standalone mode uses the same configuration variable as Mesos and Yarn modes to set the number of executors. The variable spark.cores.max defines the maximum number of cores used in the spark Context. As the default value is infinity so Spark will use all the cores in the cluster. The spark.task.cpus variable defines the no. of CPUs that Spark will allocate for a single task, the default value is 1. With these two variables you can define the maximum number of parallel tasks in your cluster.
When you create an RDD subClass you can actually define in which machine you will run your task. This is defined in the getPrefferedLocation method. But as the method signatures suggest this is only a preference so if Spark detects that one machine is not busy, it will launch the task in this idle machine. However, I don't know the mechanism used by Spark in order to figure out which machines are idle. To achieve locality, we (Stratio) actually make each partition smaller so the task takes less time and achieve locality.
For each Spark operation number of tasks is defined according to the length of the RDD's partitions. The vector is actually comes from the result of the getPartitions method that you have to override if you want to develop a new RDD subClass. This method returns how a RDD is split, where the information is and the partitions. When you join two or more RDDs using, for example, union or join operations, the number of tasks of the resulting RDD is the maximum number of tasks of the RDDs involved in the operation. For example: If you join RDD1 and RDD2 containing 100 tasks and1000 tasks respectively, the next operation of the resulting RDD will have 1000 tasks. Note that a high number of partitions is not necessarily the synonym of more data.