Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Big Data Hadoop & Spark by (11.4k points)

Is there a way to concatenate datasets of two different RDDs in spark?

Requirement is - I create two intermediate RDDs using scala which has same column names, need to combine these results of both the RDDs and cache the result for accessing to UI. How do I combine the datasets here?

RDDs are of type spark.sql.SchemaRDD

1 Answer

0 votes
by (32.3k points)
edited by

I think join transformation is your answer, as it is used to join the information of two datasets. By joining the dataset of type (K,V) and dataset (K,W), the result of the joined dataset is (K,(V,W)).

Just follow the approach given below:

val rdd1 = sc.parallelize(List((110, 50.35), (127, 305.2), (126, 211.0),(105, 6.0),(165, 31.0), (110, 40.11)))

val rdd2 = sc.parallelize(List((110, "a"), (127, "b"), (126, "b"),  (105, "a"),(165, "c")))
val join = rdd1.join(rdd2)
join.collect().foreach(println)
output:
(105,(6.0,a))
(165,(31.0,c))
(110,(50.35,a))
(110,(40.11,a))
(126,(211.0,b))
(127,(305.2,b))

Related questions

0 votes
2 answers
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
...