+1 vote
1 view
in Big Data Hadoop & Spark by (830 points)

What's the meaning of the title "Locality Level" and the 5 status Data local --> process local --> node local --> rack local --> Any?

enter image description here

2 Answers

+1 vote
by (13.2k points)

Locality level tells us which type of data access is done. 

Spark describes following locality levels -

  • PROCESS_LOCAL - data and processing are localized on the same JVM

  • NODE_LOCAL - data and processing are in the same node but on different executor. This level is slower than the previous one because it has to move the data between processed

  • RACK_LOCAL - data is located in other node than processing but both nodes are on the same rack. Obviously here the data is moved through the network.

  • NO_PREF - means no locality preference.

  • ANY - data is elsewhere but not on the same rack.

The data locality is controlled by the time it can wait for a data to be acquired before giving up  

Spark.locality.wait.

 Its value is defined in time units. Sometimes the data is not available immediately and the processing task has to wait before acquiring it. However, if the time defined in spark.locality.wait expires, Spark will try a less local level.

That is : local -> node -> rack -> any

0 votes
by (26.3k points)
The locality level indicates which type of access to data has been presented. When a node completes all its work and its CPU becomes still, Spark may decide to start another pending task that requires obtaining data from other places. So ideally, all your tasks should be processed local as it is associated with lower data access latency.

If you want to learn more about spark, refer the following video tutorial:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...