Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

Currently I'm studying Apache spark and Apache ignite frameworks.

Some principle differences between them are described in this article ignite vs spark But I realized that I still don't understand their purposes.

1 Answer

0 votes
by (32.3k points)

Apache Ignite is an open source in-memory data fabric which provides a wide variety of computing solutions including an in-memory data grid, compute grid, streaming, as well as acceleration solutions for Hadoop and Spark. Apache Spark is an open source large-scale data processing framework. Although both Ignite and Spark are in-memory computing solutions, they are complementary projects and many a times they target different use cases. In many cases, they are used together to achieve superior performance and functionality.

Apache Ignite is a general-purpose, in-memory data fabric. Ignite can distribute and cache data across multiple servers in RAM in order to deliver unprecedented processing speed and massive application scalability. Ignite supports any SQL-based RDBMS, Hadoop HDFS, NoSQL, and Amazon S3 as optional data sources. It powers both new as well as existing applications in a distributed and massively parallel architecture on affordable, industry-standard hardware.

Apache Spark was introduced as a general engine for large-scale OLAP processing. It focuses specifically on non-transactional, read-only, event-based data and enhancing big data analytics. It also includes a powerful Machine Learning Engine (MLE). Apache Spark is effective at rapidly processing data in-memory but, unlike Ignite which can work on real-time operational data, the data must be ETL-ed into Spark from other operational systems to be processed later in offline mode.

Major Differences

Although Apache Spark and Apache Ignite both carries the power of in-memory computing, Still they have some conceptual differences:

  • Spark doesn’t store data, actually it loads data from other storages for processing purpose, usually disk-based. As soon as the processing is finished it discards the data. On the other hand, Ignite provides a distributed in-memory key-value store (distributed cache or data grid) with SQL querying capabilities and ACID transactions.

  • Spark is used for non-transactional, read-only data (RDDs don’t support in-place mutation), while Ignite supports both fully ACID compliant transactions (OLTP) and non-transactional (OLAP) payloads.

...