Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I installed Spark, ran the sbt assembly, and can open bin/pyspark with no problem. However, I am running into problems loading the pyspark module into ipython. I'm getting the following error: 

In [1]: import pyspark
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-c15ae3402d12> in <module>()
----> 1 import pyspark

/usr/local/spark/python/pyspark/__init__.py in <module>()
     61
     62 from pyspark.conf import SparkConf
---> 63 from pyspark.context import SparkContext
     64 from pyspark.sql import SQLContext
     65 from pyspark.rdd import RDD

/usr/local/spark/python/pyspark/context.py in <module>()
     28 from pyspark.conf import SparkConf
     29 from pyspark.files import SparkFiles
---> 30 from pyspark.java_gateway import launch_gateway
     31 from pyspark.serializers import PickleSerializer, BatchedSerializer, UTF8Deserializer, \
     32     PairDeserializer, CompressedSerializer

/usr/local/spark/python/pyspark/java_gateway.py in <module>()
     24 from subprocess import Popen, PIPE
     25 from threading import Thread
---> 26 from py4j.java_gateway import java_import, JavaGateway, GatewayClient
     27
     28

ImportError: No module named py4j.java_gateway

1 Answer

0 votes
by (32.3k points)

In order to resolve this error you need add following paths in .bashrc:

export SPARK_HOME=/home/a141890/apps/spark

export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH

Browse Categories

...