0 votes
1 view
in Big Data Hadoop & Spark by (19k points)

Google's Dremel is described here. What's the difference between Dremel and Mapreduce?

1 Answer

0 votes
by (25.3k points)
edited ago by

Dremel is a distributed system software used for interactively querying massive datasets(such as event or log files), developed at Google. It is a data analysis tool designed to run queries on large structured datasets.

Dremel is the query engine used in Google's BigQuery service.

It is the inspiration for Apache Impala, Apache Drill, and Dremio, as it is an Apache-licensed platform that includes a distributed SQL execution engine.

On the other hand, MapReduce is not designed for analyzing data. It is a software framework that allows a collection of nodes to tackle distributed computational problems for large datasets.

An open-source implementation of MapReduce, i.e. Hadoop with co-occurrence of "Hive"(data warehouse software) also allows data analysis for massive datasets using a SQL-style syntax. Hive essentially turns queries into MapReduce functions. In contrast to using a ColumIO format, Hive uses techniques such as table indexing to make queries fast.

Google is not intending Dremel as a replacement for MapReduce and Hadoop, it is taking Dremel as a compliment to these frameworks. According to Google, Dremel is frequently used to analyze MapReduce results or serve as a test run for large scale computations. Dremel can execute many queries over such data that would ordinarily require a sequence of MapReduce, but at a fraction of the execution time. Also, Dremel experimentally surpassed MapReduce by orders of magnitude.

For more information regarding Mapreduce, refer to the following video:

...