Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (6.5k points)
What are side data distribution techniques in Hadoop?

1 Answer

0 votes
by (11.3k points)

In order to process the main dataset, there is a certain amount of extra read-only data required. This data is known as side data. There are two categories of side data distribution techniques:

  • Via the job configuration: This method is only a viable option when the data size is small (in kilobytes). Exceeding this threshold may put unnecessary pressure on the memory usage of the Hadoop daemons especially. This is especially the case when a lot of jobs are running.

  • Via distributed cache - Hadoop has a distributed cache mechanism which is a better option than serializing side data using job configuration.
To become expert in side data distribution techniques in Hadoop, you can enroll in Hadoop Online Training.

Browse Categories