Big Data Overview
With new forms of technologies cropping up day by day, the amount of data is being generated at an exponential level. With all this information one needs to understand and analyze which data can be utilized usefully and neglect the irrelevant information.
Big Data is a collection of large datasets that cannot be adequately processed using traditional processing techniques.
Big data involves the data produced in different variety, volume and velocity. Given below are some of the fields that come under Big Data.
- Black Box Data: It is an apparatus incorporated by flight crafts, which stores a cumulative sum of information, including conversation between the crew and any other communications sent out by the same to the grounds staff.
- Social Media Data: information generated by social media websites, such as Facebook, twitter etc.
- Stock Exchange Data: information generated by “buying” and “selling” decisions made by the customer in regards to their shares.
- Power Grid Data: Information consumed by a particular node with respect to a base station.
Big data is of three types.
- Structured data: Relational data.
- Semi Structured data: XML data.
- Unstructured data: Word, PDF, Text, Media Logs.
- Campaigns, promotions, and other advertising mediums can be optimized based on data collected from social media.
- Production planning can be carried out by the data made available by customers, pertaining to their preferences and perception of the product.
- Optimizing medical services by reviewing medical history of the patients.
Characteristics of Big Data
It has three characteristics –
Accurate analysis carried out based on big data will help increase and optimize operational efficiencies, enable cost reductions, and reduce risks for the business.
In order to capitalize on big data one requires infrastructure that can manage and process huge volumes of structured and unstructured data in real-time and can ensure data privacy and security.
Many technologies are available in the market from different vendors including Amazon, IBM, Microsoft, etc., to approach big data. To pick a particular technology one must examine its classes, which are:
Operational Big Data
It includes MongoDB which gives operational capabilities for interactive and real time workloads where data is generally captured and stored.
NoSQL Big Data systems are designed to capitalize on new cloud computing architectures, to permit massive computations to be run reasonably and efficiently. This builds operational big data workloads much easier to manage, cheaper and faster to implement. Learn more about Big Data Analytics Tools – Measures For Testing The Performances in this insightful blog now!
Analytical Big Data
It has the systems like Massively Parallel Processing database systems and MapReduce which gives analytical capabilities for re collective and complex analysis.
MapReduce provides a new method of analyzing data that flatters the capabilities provided by SQL, and a system based on MapReduce that can be scaled up from single servers to thousands of high and low end machines.
Operational vs. Analytical Systems
|Latency||1 ms – 100 ms||1 min – 100 min|
|Concurrency||1000 – 100,000||1 – 10|
|Access Pattern||Writes and Reads||Reads|
|End User||Customer||Data Scientist|
|Technology||NoSQL||MapReduce, MPP Database|
Barriers imposed to big data are as follows:
- Capturing data
Enterprise servers are used to overcome above mentioned barriers.