0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I'm new with apache spark and apparently I installed apache-spark with homebrew in my MacBook.

I would like start playing in order to learn more about MLlib. However, I use Pycharm to write scripts in python. The problem is: when I go to Pycharm and try to call pyspark, Pycharm can not found the module. I tried adding the path to Pycharm as follows:

cant link pycharm with spark

Then from a blog I tried this:

import os
import sys

# Path for spark source folder
os.environ['SPARK_HOME']="/Users/user/Apps/spark-1.5.2-bin-hadoop2.4"

# Append pyspark  to Python Path
sys.path.append("/Users/user/Apps/spark-1.5.2-bin-hadoop2.4/python/pyspark")

try:
    from pyspark import SparkContext
    from pyspark import SparkConf
    print ("Successfully imported Spark Modules")

except ImportError as e:
    print ("Can not import Spark Modules", e)
    sys.exit(1)

And still can not start using PySpark with Pycharm, any idea of how to "link" PyCharm with apache-pyspark?

1 Answer

0 votes
by (32.2k points)
edited by

Prerequisites:

1. Pycharm

2. Python

3. Spark

Firstly in your Pycharm interface, install Pyspark by following these steps:

Go to File -> Settings -> Project Interpreter

  • Click on install button and search for PySpark

  • Click on install package button.

  • Manually with user provided Spark installation

Now, create Run configuration:

  • Go to Run -> Edit configurations

  • Add new Python configuration

  • Set Script path so it points to the script you want to execute

  • Edit Environment variables field so it contains at least:

SPARK_HOME - it should point to the directory with Spark installation. It should contain                      directories such as bin (with spark-submit, spark-shell, etc.) and conf (with spark-defaults.conf, spark-env.sh, etc.)

PYTHONPATH - it should contain $SPARK_HOME/python and optionally $SPARK_HOME/python/lib/py4j-some-version.src.zip if not available otherwise. some-version should match Py4J version used by a given Spark installation (0.8.2.1 - 1.5, 0.9 - 1.6, 0.10.3 - 2.0, 0.10.4 - 2.1, 0.10.4 - 2.2, 0.10.6 - 2.3)

  • Apply the settings

Add PySpark library to the interpreter path (required for code completion):

Go to File -> Settings -> Project Interpreter

Open settings for an interpreter you want to use with Spark

Edit interpreter paths so it contains path to $SPARK_HOME/python (an Py4J if required)

Save the settings

Finally

Use newly created configuration to run your script.

If you want to know more about PySpark, then do check out this awesome video tutorial:

Related questions

0 votes
1 answer
0 votes
0 answers
asked Jan 5 in Python by spec300 (120 points)
0 votes
1 answer
0 votes
1 answer
asked May 21, 2020 in Python by Sudhir_1997 (50.9k points)
Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...