0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)
edited by

How can I load an existing CSV file and convert it as a DataFrame in Spark?

I want the exact command to load CSV file as DF.

I have tried:

scala> val df = sqlContext.load("hdfs:///csv/file/dir/file.csv")

But got an error.

java.lang.RuntimeException: hdfs:///csv/file/dir/file.csv is not a Parquet file.

1 Answer

0 votes
by (31.4k points)
edited by

A DataFrame can be defined as a dataset designed as named columns,i.e. is a distributed collection of data. Conceptually, it is equivalent to relational tables.

Spark functionality contains some core parts and CSV is one of them.

A DataFrame may be created from a variety of input sources including the CSV text files, JSON files, etc.

To load a CSV file as a DataFrame write these command on your Spark shell :


 

df=spark.read.format("csv").option("header","true").load("/home/amit/uo.csv")

You can refer the following video if you want more information regarding the same:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...