What's the meaning of “Locality Level”on Spark cluster

Question

2 Answers

Shivangi · Answer 1 · 2019-07-09T07:03:45+0000

Locality level tells us which type of data access is done.

Spark describes following locality levels -

PROCESS_LOCAL - data and processing are localized on the same JVM
NODE_LOCAL - data and processing are in the same node but on different executor. This level is slower than the previous one because it has to move the data between processed
RACK_LOCAL - data is located in other node than processing but both nodes are on the same rack. Obviously here the data is moved through the network.
NO_PREF - means no locality preference.
ANY - data is elsewhere but not on the same rack.

The data locality is controlled by the time it can wait for a data to be acquired before giving up

Spark.locality.wait.

Its value is defined in time units. Sometimes the data is not available immediately and the processing task has to wait before acquiring it. However, if the time defined in spark.locality.wait expires, Spark will try a less local level.

That is : local -> node -> rack -> any

Amit Rawat · Answer 2 · 2019-09-18T12:51:40+0000

The locality level indicates which type of access to data has been presented. When a node completes all its work and its CPU becomes still, Spark may decide to start another pending task that requires obtaining data from other places. So ideally, all your tasks should be processed local as it is associated with lower data access latency.

If you want to learn more about spark, refer the following video tutorial:

What's the meaning of “Locality Level”on Spark cluster

2 Answers

Related questions

Browse Categories