Apache Spark is an open source distributed cluster computing framework. And it can definitely run with Hadoop.
As Hadoop is a framework for distributed storage (HDFS) and distributed processing (YARN).
It is only used by Spark for storing and processing purpose and that too can be substituted by other storages and cluster managers available for Spark.
Since Spark does not have its own distributed storage system, it has to depend on one of these storage systems for distributed computing.
S3 – Non-urgent batch jobs. S3 fits very specific use cases when data locality isn’t critical.
Cassandra – Perfect for streaming data analysis and an overkill for batch jobs.
HDFS – Great fit for batch jobs without compromising on data locality.
You can run Spark in three different modes on following cluster managers: