Next in this section on Hadoop Architecture, let’s see how Hadoop works.
How does Hadoop work?
Hadoop runs code across a cluster of computers and performs the following tasks:
- Data is initially divided into files and directories. Files are then divided into consistently sized blocks ranging from 128 MB in Hadoop 2 to 64 MB in Hadoop 1.
- Then, the files are distributed across various cluster nodes for further processing of data.
- The JobTracker starts its scheduling programs on individual nodes.
- Once all the nodes are done with scheduling, the output is returned.
Data from HDFS is consumed through MapReduce applications. HDFS is also responsible for multiple replicas of data blocks that are created along with the distribution of nodes in a cluster, which enables reliable and extremely quick computations.
So, in the first step, the file is divided into blocks and is stored in different DataNodes. If a job request is generated, it is directed to the JobTracker.
The JobTracker doesn’t really know the location of the file. So, it contacts with the NameNode for this.
The NameNode will now find the location and give it to the JobTracker for further processing.
Now, since the JobTracker knows the location of the blocks of the requested file, it will contact the TaskTracker present on a particular DataNode for the data file.
The TaskTracker will now send the data it has to the JobTracker.
Finally, the JobTracker will collect the data and send it back to the requested source.
What is Big Data? If you have more queries related to Big Data Hadoop, do post them on Big Data Hadoop and Spark Community!
How does Yahoo! use Hadoop Architecture?
In Yahoo!, there are 36 different Hadoop clusters that are spread across Apache HBase, Storm, and YARN, i.e., there are 60,000 servers in total made from 100s of distinct hardware configurations. Yahoo! runs the largest multi-tenant Hadoop installation in the world. There are approximately 850,000 Hadoop jobs daily, which are run by Yahoo!.
The cost of storing and processing data using Hadoop is the best way to determine whether Hadoop is the right choice for your company. When comparing on the basis of the expense for managing data, Hadoop is much cheaper than any legacy systems.
Interested in learning ‘What is Big Data Hadoop?’ Check out the Big Data Hadoop Training in London!
In this section of the Hadoop tutorial, we learned about the two layers present in the Hadoop architecture. We also saw how Yahoo! uses Hadoop. Now that we have seen the robust architecture of Hadoop, let’s move on with our next section of the tutorial, i.e., Hadoop installation.
What is Big Data Hadoop? Enroll in our Big Data Hadoop Training now and kick-start your career!