0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

E.g

sqlContext = SQLContext(sc)

sample=sqlContext.sql("select Name ,age ,city from user")
sample.show()


The above statement print entire table on terminal but i want to access each row in that table using for or while to perform further calculations .

1 Answer

0 votes
by (31.4k points)

Using list comprehensions in python, you can collect an entire column of values into a list using just two lines:

df = sqlContext.sql("show tables in default")

tableList = [x["tableName"] for x in df.rdd.collect()]

In the above example, we return a list of tables in database 'default', but the same can be adapted by replacing the query used in sql().

Or more abbreviated:

tableList = [x["tableName"] for x in sqlContext.sql("show tables in default").rdd.collect()]

And for your example of three columns, we can create a list of dictionaries, and then iterate through them in a for loop.

sql_text = "select name, age, city from user"

tupleList = [{name:x["name"], age:x["age"], city:x["city"]} 

             for x in sqlContext.sql(sql_text).rdd.collect()]

for row in tupleList:

    print("{} is a {} year old from {}".format(

        row["name"],

        row["age"],

        row["city"]))

Learn Spark with this Spark Certification Course by Intellipaat.

Related questions

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...