Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

E.g

sqlContext = SQLContext(sc)

sample=sqlContext.sql("select Name ,age ,city from user")
sample.show()


The above statement print entire table on terminal but i want to access each row in that table using for or while to perform further calculations .

1 Answer

0 votes
by (32.3k points)

Using list comprehensions in python, you can collect an entire column of values into a list using just two lines:

df = sqlContext.sql("show tables in default")

tableList = [x["tableName"] for x in df.rdd.collect()]

In the above example, we return a list of tables in database 'default', but the same can be adapted by replacing the query used in sql().

Or more abbreviated:

tableList = [x["tableName"] for x in sqlContext.sql("show tables in default").rdd.collect()]

And for your example of three columns, we can create a list of dictionaries, and then iterate through them in a for loop.

sql_text = "select name, age, city from user"

tupleList = [{name:x["name"], age:x["age"], city:x["city"]} 

             for x in sqlContext.sql(sql_text).rdd.collect()]

for row in tupleList:

    print("{} is a {} year old from {}".format(

        row["name"],

        row["age"],

        row["city"]))

Learn Spark with this Spark Certification Course by Intellipaat.

how to loop through each row of dataFrame in pyspark
Intellipaat-community
by
Your last block of code has an error in it. You need to put the dictionary keys in quotes like this:

sql_text = "select name, age, city from user"
tupleList = [{"name":x["name"], "age":x["age"], "city":x["city"]}
             for x in sqlContext.sql(sql_text).rdd.collect()]

for row in tupleList:
    print("{} is a {} year old from {}".format(
        row["name"],
        row["age"],
        row["city"]))

31k questions

32.9k answers

507 comments

693 users

...