0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)

Could someone help me solve this problem I have with Spark DataFrame?

When I do myFloatRDD.toDF() I get an error:

TypeError: Can not infer schema for type: type 'float'

I don't understand why...

Example:

myFloatRdd = sc.parallelize([1.0,2.0,3.0])
df = myFloatRdd.toDF()

1 Answer

0 votes
by (31.4k points)
edited by

SparkSession.createDataFrame, requires an RDD of Row/tuple/list, unless schema with DataType is provided. I would suggest you convert float to tuple like this:

from pyspark.sql import Row

row = Row("val") # Or some other column name

myFloatRdd.map(row).toDF()

To create a DataFrame from a list of scalars, you'll have to use SparkSession.createDataFrame directly and provide a schema:

from pyspark.sql.types import FloatType

df = spark.createDataFrame([1.0, 2.0, 3.0], FloatType())

df.show()

## +-----+

## |value|

## +-----+

## |  1.0|

## |  2.0|

## |  3.0|

## +-----+

If you want to know more about Spark, then do check out this awesome video tutorial:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...