Let’s assume that your dependencies are listed in requirements.txt. In order to package and zip the dependencies, I would suggest you to run the following at the command line:
pip install -t dependencies -r requirements.txt
cd dependencies
zip -r ../dependencies.zip .
To ensure that the modules are in the top level of the zip file, the cd dependencies command mentioned above are crucial.
Next, submit the job via:
spark-submit --py-files dependencies.zip spark_job.py
Now, The --py-files directive has send the zip file to the Spark workers but does not add it to the PYTHONPATH. So, in order to add the dependencies to the PYTHONPATH to fix the ImportError, you must add the following line to the Spark job, spark_job.py:
sc.addPyFile("dependencies.zip")