0 votes
1 view
in Big Data Hadoop & Spark by (11.5k points)
I want to read a bunch of text files from a hdfs location and perform mapping on it in an iteration using spark.

JavaRDD<String> records = ctx.textFile(args[1], 1); is capable of reading only one file at a time.

I want to read more than one file and process them as a single RDD. How?

1 Answer

0 votes
by (32.5k points)
edited by

Try to create all the RDD first and then put them all into a single RDD.

val sc = new SparkContext(...)

val r1 = sc.textFile("xxx1")

val r2 = sc.textFile("xxx2")


val rdds = Seq(r1, r2, ...)

val FinalRdd = sc.union(rdds)

Now the FinalRdd is the RDD with all files.

If you want more information regarding Spark, refer the following video:

Welcome to Intellipaat Community. Get your technical queries answered by top developers !