Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (55.6k points)

Can anyone explain what is distributed cache in Hadoop?

1 Answer

0 votes
by (119k points)
edited by

Distributed cache is a service by Hadoop’s MapReduce framework to cache small to moderate read-only files when needed.

Once a file is cached for a specific job, Hadoop will broadcast them to each DataNode both in the system and in memory, where map and reduce tasks are executed.

When required, it helps to access and read the cache file easily and populate any collection (like an array, or hashmap) in your code.

Want to learn big data? Enroll in this big data course taught by industry experts.

This is most frequently asked in interviews and you can check out this blog on Hadoop Interview Questions for other FAQs in an interview with simple and lucid explanations. 

Also, check this video for more information:

Browse Categories