Cartesian product and combinations are two different things, the cartesian product will create an RDD of size rdd.size() ^ 2 and combinations(defined as “combs” in the below code) will create an RDD of size rdd.size() choose 2

val rdd = spark.sparkContext.parallelize(1 to 5)

val combs = rdd.cartesian(rdd).filter{ case (a,b) => a < b }

combs.collect()

Note this will only work if an ordering is defined on the elements of the list, since we use <. This one only works for choosing two but can easily be extended by making sure the relationship a < b for all a and b in the sequence.