Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

Running a simple app in pyspark.

f = sc.textFile("")
wc = f.flatMap(lambda x: x.split(' ')).map(lambda x: (x, 1)).reduceByKey(add)

I want to view RDD contents using foreach action:


This throws a syntax error:

SyntaxError: invalid syntax

What am I missing?

1 Answer

0 votes
by (32.3k points)
edited by

You are encountering this error because in Python 2.6 print isn't a function.

You can either use the __future__ library to treat print as a function:

>>> from __future__ import print_function

>>> wc.foreach(print)


define a helper UDF that performs the print:

>>> from operator import add

>>> f = sc.textFile("")

>>> def g(x):

...     print x


>>> wc.foreach(g)

Note: for each executes on the worker nodes and the outputs may not necessarily appear in your driver/shell (it probably will in local mode, but not when running on a cluster).

Therefore, it would be better to use collect() to bring the RDD contents back to the driver.

>>> for x in wc.collect():

...     print x

If you want to know more about Spark, then do check out this awesome video tutorial:


Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories