Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

E.g

sqlContext = SQLContext(sc)

sample=sqlContext.sql("select Name ,age ,city from user")
sample.show()


The above statement print entire table on terminal but i want to access each row in that table using for or while to perform further calculations .

1 Answer

0 votes
by (32.3k points)

Using list comprehensions in python, you can collect an entire column of values into a list using just two lines:

df = sqlContext.sql("show tables in default")

tableList = [x["tableName"] for x in df.rdd.collect()]

In the above example, we return a list of tables in database 'default', but the same can be adapted by replacing the query used in sql().

Or more abbreviated:

tableList = [x["tableName"] for x in sqlContext.sql("show tables in default").rdd.collect()]

And for your example of three columns, we can create a list of dictionaries, and then iterate through them in a for loop.

sql_text = "select name, age, city from user"

tupleList = [{name:x["name"], age:x["age"], city:x["city"]} 

             for x in sqlContext.sql(sql_text).rdd.collect()]

for row in tupleList:

    print("{} is a {} year old from {}".format(

        row["name"],

        row["age"],

        row["city"]))

Learn Spark with this Spark Certification Course by Intellipaat.

by (100 points)
Your last block of code has an error in it. You need to put the dictionary keys in quotes like this:

sql_text = "select name, age, city from user"
tupleList = [{"name":x["name"], "age":x["age"], "city":x["city"]}
             for x in sqlContext.sql(sql_text).rdd.collect()]

for row in tupleList:
    print("{} is a {} year old from {}".format(
        row["name"],
        row["age"],
        row["city"]))

Browse Categories

...