I'm new with apache spark and apparently I installed apache-spark with homebrew in my MacBook.

I would like start playing in order to learn more about MLlib. However, I use Pycharm to write scripts in python. The problem is: when I go to Pycharm and try to call pyspark, Pycharm can not found the module. I tried adding the path to Pycharm as follows:

cant link pycharm with spark

Then from a blog I tried this:

import os
import sys

# Path for spark source folder

# Append pyspark  to Python Path

    from pyspark import SparkContext
    from pyspark import SparkConf
    print ("Successfully imported Spark Modules")

except ImportError as e:
    print ("Can not import Spark Modules", e)

And still can not start using PySpark with Pycharm, any idea of how to "link" PyCharm with apache-pyspark?

1. Pycharm

2. Python

3. Spark

Firstly in your Pycharm interface, install Pyspark by following these steps:

Go to File -> Settings -> Project Interpreter

  • Click on install button and search for PySpark

  • Click on install package button.

  • Manually with user provided Spark installation

Now, create Run configuration:

  • Go to Run -> Edit configurations

  • Add new Python configuration

  • Set Script path so it points to the script you want to execute

  • Edit Environment variables field so it contains at least:

SPARK_HOME - it should point to the directory with Spark installation. It should contain                      directories such as bin (with spark-submit, spark-shell, etc.) and conf (with spark-defaults.conf,, etc.)

PYTHONPATH - it should contain $SPARK_HOME/python and optionally $SPARK_HOME/python/lib/ if not available otherwise. some-version should match Py4J version used by a given Spark installation ( - 1.5, 0.9 - 1.6, 0.10.3 - 2.0, 0.10.4 - 2.1, 0.10.4 - 2.2, 0.10.6 - 2.3)

  • Apply the settings

Add PySpark library to the interpreter path (required for code completion):

Go to File -> Settings -> Project Interpreter

Open settings for an interpreter you want to use with Spark

Edit interpreter paths so it contains path to $SPARK_HOME/python (an Py4J if required)

Save the settings


Use newly created configuration to run your script.

