I have a dataframe with column as String. I wanted to change the column type to Double type in PySpark.

Following is the way, I did:

toDoublefunc = UserDefinedFunction(lambda x: x,DoubleType())
changedTypedf = joindf.withColumn("label",toDoublefunc(joindf['show']))

Just wanted to know, is this the right way to do it as while running through Logistic Regression, I am getting some error, so I wonder, is this the reason for the trouble.

Your method seems fine to me, still if you are finding some errors I would suggest you to try this approach:


changedTypedf = joindf.withColumn("show", joindf["show"].cast(DoubleType()))

Here, you are preserving the name of the column and avoid extra column addition by using the same name(show) as an input column. 

