Explore Courses Blog Tutorials Interview Questions
0 votes
in Big Data Hadoop & Spark by (11.4k points)

I have a dataframe with column as String. I wanted to change the column type to Double type in PySpark.

Following is the way, I did:

toDoublefunc = UserDefinedFunction(lambda x: x,DoubleType())
changedTypedf = joindf.withColumn("label",toDoublefunc(joindf['show']))

Just wanted to know, is this the right way to do it as while running through Logistic Regression, I am getting some error, so I wonder, is this the reason for the trouble.

1 Answer

0 votes
by (32.3k points)
edited by

Your method seems fine to me, still if you are finding some errors I would suggest you to try this approach:


changedTypedf = joindf.withColumn("show", joindf["show"].cast(DoubleType()))

Here, you are preserving the name of the column and avoid extra column addition by using the same name(show) as an input column. 

If you want more knowledge regarding PySpark, refer the following video:

Browse Categories