Suppose I give three files paths to a Spark context to read and each file has a schema in the first row. How can we skip schema lines from headers?

val rdd=sc.textFile("file1,file2,file3")

Now, how can we skip header lines from this rdd?

A simple way would be to just 

filter the initial read based on what your header looks like 

rdd = sc.textFile(X).filter(!_.startsWith("beginningOfYourHeader")).cache() 

For  Spark 2.0 and onwards user what you can do is use SparkSession to get this done as a one liner:

val spark = SparkSession.builder.config(conf).getOrCreate()

val dataFrame ="CSV").option("header","true").load(csvfilePath)

Another approach will be using python equivalent:

from itertools import islice


    lambda idx, it: islice(it, 1, None) if idx == 0 else it 


