Spark doesn't support this by default. Filtering on the same data twice isn't that bad if you cache it beforehand, and the filtering itself is quick.
If it's really just two different types, you can use a helper method:
implicit class RDDOps[T](rdd: RDD[T]) {
def partitionBy(f: T => Boolean): (RDD[T], RDD[T]) = {
val passes = rdd.filter(f)
val fails = rdd.filter(e => !f(e)) // Spark doesn't have filterNot
(passes, fails)
}
}
val (matches, matchesNot) = sc.parallelize(1 to 100).cache().partitionBy(_ % 2 == 0)
And keep in mind that as soon as you have multiple types of data, just assign the filtered data to a new val.