0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I'm running a Spark job with in a speculation mode. I have around 500 tasks and around 500 files of 1 GB gz compressed. I keep getting in each job, for 1-2 tasks, the attached error where it reruns afterward dozens of times (preventing the job to complete).

org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 0

Any idea what is the meaning of the problem and how to overcome it?

1 Answer

0 votes
by (32.5k points)

I encountered a similar problem. So, I checked the Yarn logs on the specific nodes and found out that we have some kind of out-of-memory problem, so the Yarn interrupted the execution. This thing happened because I gave more memory to the worker node than its default max capacity. As a result Spark crashed while trying to store objects for shuffling with no more memory left.

I would suggest you to either add swap, or configure the worker/executor to use less memory in addition with using MEMORY_AND_DISK storage level for several persists.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !