Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

I'm using Spark 1.3.1.

I am trying to view the values of a Spark dataframe column in Python. With a Spark dataframe, I can do df.collect() to view the contents of the dataframe, but there is no such method for a Spark dataframe column as best as I can see.

For example, the dataframe df contains a column named 'zip_code'. So I can do df['zip_code'] and it turns a pyspark.sql.dataframe.Column type, but I can't find a way to view the values in df['zip_code'].

1 Answer

0 votes
by (32.3k points)

You can do one thing, just access underlying RDD and simply map over it

df.rdd.map(lambda r: r.zip_code).collect()

You may also use select if you don't mind results wrapped using Row objects:

df.select('zip_code').collect()

Finally, if you simply want to inspect content then just use show method:

df.select('zip_code').show()

Related questions

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...