I'm new to Spark and I'm trying to read CSV data from a file with Spark. Here's what I am doing :

    .map(lambda line: (line.split(',')[0], line.split(',')[1]))

I would expect this call to give me a list of the two first columns of my file but I'm getting this error :

File "<ipython-input-60-73ea98550983>", line 1, in <lambda>
IndexError: list index out of range

although my CSV file as more than one column.

Just try this instead of your code:

sc.textFile("file.csv") \

    .map(lambda line: line.split(",")) \

    .filter(lambda line: len(line)<=1) \


This will work you you.

You can also use built-in csv data source directly:

    "some_input_file.csv", header=True, mode="DROPMALFORMED", schema=schema





    .option("header", "true")

    .option("mode", "DROPMALFORMED")


without including any external dependencies.

If you want to know more about Spark, then do check out this awesome video tutorial:

