0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

I have Spark installed properly on my machine and am able to run python programs with the pyspark modules without error when using ./bin/pyspark as my python interpreter.

However, when I attempt to run the regular Python shell, when I try to import pyspark modules I get this error:

from pyspark import SparkContext
 

and it says

"No module named pyspark"

How can I fix this? Is there an environment variable I need to set to point Python to the pyspark headers/libraries/etc.? If my spark installation is /spark/, which pyspark paths do I need to include? Or can pyspark programs only be run from the pyspark interpreter?

1 Answer

0 votes
by (32.2k points)
edited by

Add the below export path line to bashrc file and and hopefully your modules will be correctly found:

# Add the PySpark classes to the Python path:

export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

There is one more method.

Use findspark

1. Go to your python shell

            pip install findspark

            import findspark

            findspark.init()

 2.import the necessary modules

           from pyspark import SparkContext

     from pyspark import SparkConf

 

Now, you will find no errors and successful import of Spark modules will be done.

If you want to know more about Spark, then do check out this awesome video tutorial:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...