Locality level tells us which type of data access is done.
Spark describes following locality levels -
PROCESS_LOCAL - data and processing are localized on the same JVM
NODE_LOCAL - data and processing are in the same node but on different executor. This level is slower than the previous one because it has to move the data between processed
RACK_LOCAL - data is located in other node than processing but both nodes are on the same rack. Obviously here the data is moved through the network.
NO_PREF - means no locality preference.
ANY - data is elsewhere but not on the same rack.
The data locality is controlled by the time it can wait for a data to be acquired before giving up
Spark.locality.wait.
Its value is defined in time units. Sometimes the data is not available immediately and the processing task has to wait before acquiring it. However, if the time defined in spark.locality.wait expires, Spark will try a less local level.
That is : local -> node -> rack -> any