Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

>>> a
DataFrame[id: bigint, julian_date: string, user_id: bigint]
>>> b
DataFrame[id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]
>>> a.join(b, a.id==b.id, 'outer')
DataFrame[id: bigint, julian_date: string, user_id: bigint, id: bigint, quan_created_money: decimal(10,0), quan_created_cnt: bigint]


There are two id: bigint and I want to delete one. How can I do?

1 Answer

0 votes
by (32.3k points)

For Spark 1.4+  a function drop(col) is available, which can be used in Pyspark on a dataframe in order to remove a column.

You can use it in two ways:

  • df.drop('a_column').collect()

  • df.drop(df.a_column).collect()

Also, to drop multiple columns at a time you can use the following:

columns_to_drop = ['a column', 'b column']

df = df.drop(*columns_to_drop)

Browse Categories

...