Requirements
Pycharm
Python
Spark
In your Pycharm interface:
Install Pyspark with the below process-
Go to file -> settings -> Project interpreter
Select the install button then in search type pyspark.Then, click install package.
Manually
User-provided installation of Spark
Run configurations
Go to Run -> Edit Configurations. Then new option on the left bar-> New and then on selecting new python configuration put tick in Run and set a little box named Script path this is pointing to that which you want to run this time, edit Environment variable it contains at least:.
SPARK_HOME: it has to refer to the spark installation directory. It contains such directories as: bin-it has files like - spark-submit, spark-shell, etc. conf-it must have files like - spark-defaults.conf, spark-env.sh, etc.
PY SPARK_PATH - this should include $SPARK_HOME/python and optionally $SPARK_HOME/python/lib/py4j- some-version.src.zip (in case it's not included somewhere else). The version has to be corresponding to Py4J usage of some concrete Spark install. Here are examples:
0.8.2.1-1.5, 0.9-1.6, 0.10.3-2.0, 0.10.4-2.1, 0.10.4-2.2, 0.10.6-2.3
configuration for settings
To sum up, install PySpark library to interpreter's path:
File -> Settings -> Project Interpreter
Open configuration for an interpreter you would like to use with Spark
Edit interpreter paths so it includes the path to $SPARK_HOME/python (and a Py4J if needed)
Save the configuration
Finally
Use your newly created configuration to run your script.