Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (32.1k points)
Can anyone tell me what is Big Data and Hadoop?

1 Answer

0 votes
by (45.3k points)

In simple terms, Big Data is the data that can be considered to have the 5 Vs: Volume, velocity, variety, veracity, and value.

  • Volume: Any data that comes in huge volumes can be called Big Data. Today, data is generated in terabytes and petabytes, making them ‘big’ enough to be considered as Big Data.
  • Velocity: A Big Data problem has to handle the data that is generated continuously at high speed. The applications of the day produce huge amounts of data every second. So, to deal with this multiplying data efficiently, Big Data Analysts would require smart tools and techniques.
  • Variety: Here, the term ‘variety’ refers to the heterogeneous nature of the data. That is, the real-time data is always produced by heterogeneous sources, and therefore, it can be either structured or unstructured, or even semi-structured. This variety is one of the biggest factors that make Big Data difficult to deal with.
  • Veracity: Another main characteristic of Big Data is its veracity. It refers to the errors and inconsistencies that might be present in the data. When working with Big Data, analysts should be extra careful to derive solutions that can cover up these noises well.
  • Value: Although there are chances of having missing values, errors, or inconsistencies in data, the data should always be beneficial for extracting insights from. For that, it needs to have useful information within it. If the attributes present inside the data are not valuable enough, they cannot be used for further analyses.

Hadoop, on the other hand, is an open-source data framework that is meant for storing and analyzing Big Data. Developed in Java, Hadoop mainly consists of two core components: HDFS and MapReduce. 

Hadoop Distributed File System (HDFS) is responsible for storing huge amounts of data efficiently and in a fault-tolerant manner. It stores the data across huge distributed clusters. MapReduce is Hadoop's processing engine, and it is meant for parallel computation of the data.

If you are looking for an online course to learn Big Data, check out this Big Data Training course by Intellipaat.

Browse Categories