0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)


sqlContext = SQLContext(sc)

sample=sqlContext.sql("select Name ,age ,city from user")

The above statement print entire table on terminal but i want to access each row in that table using for or while to perform further calculations .

1 Answer

0 votes
by (32.5k points)

Using list comprehensions in python, you can collect an entire column of values into a list using just two lines:

df = sqlContext.sql("show tables in default")

tableList = [x["tableName"] for x in df.rdd.collect()]

In the above example, we return a list of tables in database 'default', but the same can be adapted by replacing the query used in sql().

Or more abbreviated:

tableList = [x["tableName"] for x in sqlContext.sql("show tables in default").rdd.collect()]

And for your example of three columns, we can create a list of dictionaries, and then iterate through them in a for loop.

sql_text = "select name, age, city from user"

tupleList = [{name:x["name"], age:x["age"], city:x["city"]} 

             for x in sqlContext.sql(sql_text).rdd.collect()]

for row in tupleList:

    print("{} is a {} year old from {}".format(




Learn Spark with this Spark Certification Course by Intellipaat.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !