Hadoop is a combined set of open-source utilities used to tackle the Big Data problem, revolving around parallel computing and distributed filesystems. Hadoop is the industry-wide accepted solution for the storage of data across various nodes robustly with the help of HDFS (Hadoop Distributed File System) and the optional computational component known as 'MapReduce' (a software framework used to perform processing jobs).
Apache Spark is an open-source framework that has been developed for cluster-based computing and is 100 times faster than Hadoop MapReduce. With Spark, we can avail of implicit features, such as data parallelism and fault tolerance. In simple words, Spark provides an easy interface to process vast amounts of data in large clusters in a parallel manner with minimum data loss. The framework was developed at the University of California, Berkeley, and is currently maintained by the Apache Software Foundation.