Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

I have a dataframe with a few columns. Now I want to derive a new column from 2 other columns:

from pyspark.sql import functions as F
new_df = df.withColumn("new_col", F.when(df["col-1"] > 0.0 & df["col-2"] > 0.0, 1).otherwise(0))

With this I only get an exception:

py4j.Py4JException: Method and([class java.lang.Double]) does not exist

It works with just one condition like this:

new_df = df.withColumn("new_col", F.when(df["col-1"] > 0.0, 1).otherwise(0))

Does anyone know to use multiple conditions?

I'm using Spark 1.4.

1 Answer

0 votes
by (32.3k points)

Use parentheses to enforce the desired operator precedence:

F.when( (df["col-1"]>0.0) & (df["col-2"]>0.0), 1).otherwise(0)

You can also try this approach:

from pyspark.sql.functions import col

F.when(col("col-1")>0.0) & (col("col-2")>0.0), 1).otherwise(0)

If you wish to learn Pyspark visit this Pyspark Tutorial.

You can learn in-depth about SQL statements, queries and become proficient in SQL queries by enrolling in our industry-recognized SQL training online.

Related questions

Browse Categories