+9 votes
2 views
in Big Data Hadoop & Spark by (1.5k points)

As Wikipedia states

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use

How is this related to Big Data? Is it correct if I say that Hadoop is doing data mining in a parallel manner? 

4 Answers

+7 votes
by (13.2k points)

Short explanation:

Data Mining and Big data are two different things, while both of them relate to use of large datasets to handle the data that will serve our purpose, they are two different terms in the aspect of operation they are used for. Big Data refers to a collection of large datasets ( eg- datasets in Excel sheets which are too large to be handled easily). Data Mining on the other hand refers to the activity of going through a large chunk of data to look for relevant or pertinent information.

Detailed Explanation:

What is Big Data?

Big data refers to huge amount of data which is not easy to handle with conventional ways, it might be structured, semi- structured or unstructured. It comprises of 5 Vs-

  1. Volume

Refers to amount of data. ( can be in quintillions)

     2.   Variety

    Refers to type of data we can use. ( structured, unstructured or semi-structured)

     3.   Value

    Refers to the worth of data being extracted.

     4.  Veracity

    It refers to the quality or the trustworthiness of the data we have.

     5.  Velocity

    Refers to how fast our data is growing.

Why is Big Data Important?

Most things in today’s scenario are driven by profitability they give in terms of monetary benefits, these tools help in providing meaningful information for making better business decisions and can also be used to study various other things which could benefit humanity.

Why is Data Mining important?

Data Mining is important because of various reasons, the most vital and useful of them is to understand what is relevant and make a good use of it to assess the things as the new data comes into picture, this in turn branches into various use cases in places like healthcare industry, financial market analysis etc.

Comparison

Having understood both the concepts fairly well, we can say they are 2 very different concepts, The main concept if we look in Data Mining is to dig into the data and analyse the pattern and relationship which can further be useful in prediction algorithms like of Linear Regression in Artificial Intelligence. The main concept in Big Data on the other hand is velocity, source, security of the huge amount of data at our disposal.

It can be said that Data Mining is not dependent on Big Data, as it can be done on any amount of data ( preferentially big, as it gives more test cases and hence accurate results) be it big or small. Big Data on the other hand is very much dependent on data mining as we need to find the use of the big volume of data we have, it is no use without its analysis.

+2 votes
by (10.9k points)
edited by
Following are some difference between data mining and Big Data:

1. Big data is a term which refers to a large amount of data and Data mining refers to deep dive into the data to extract data from a large amount of data.

2. Big data is a concept than a precise term whereas, Data mining is a technique for analyzing data.

3. Big data contains structured,semi-structured and unstructured data whereas, Data mining contains structured data, relational and dimensional database.

4. Big data mainly focuses on lots of relationship between data, Data mining focuses on lots of details of a data.
0 votes
by (33.2k points)

You can say that big data is a trending field today, but data mining is an old domain.

Before Big data these tasks come under data mining:

  • Collecting data

  • Storing data

  • Machine learning / AI

  • Non-ML data mining (as in "knowledge discovery", where the term data mining was actually coined)

  • Business rules and analytics

  • Visualization

Big data is related to huge amount of data like hundreds or thousands of terabytes of data, but data mining is not about large data sets. 

Data mining consists of exploring data, finding patterns and applying machine learning on data. Now, this term is known as Data Science

Hope this answer helps.

0 votes
by (90.8k points)

Big Data is a term that consists of a collection of frameworks and tools which could do miracles with very large data sets including Data Mining.

Hadoop is a framework that will split the very large data sets into blocks(by default 64 MB) then it will store it in HDFS (Hadoop Distributed File System) and then when its execution logic(MapReduce) appears with any bytecode to process the data stored at HDFS. It will take the split based on the block(splits can be configured) and impose the extraction and computation via the Mapper and Reducer process. In this way, you could do the ETL process, Data Mining, Data Computation, etc.

You can refer the following video to know about Big Data:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...