Spark extracting values from a Row

Question

1 Answer

Amit Rawat · Answer 1 · 2019-07-15T14:22:33+0000

Let’s take a dummy data:

val transactions = Seq((1, 2), (1, 4), (2, 3)).toDF("user_id", "category_id")
val transactions_with_counts = transactions
.groupBy($"user_id", $"category_id")
.count
transactions_with_counts.printSchema
// root
// |-- user_id: integer (nullable = false)
// |-- category_id: integer (nullable = false)
// |-- count: long (nullable = false)

Now, given below are some ways to access Row values and keep expected types:

1.Pattern matching

import org.apache.spark.sql.Row
transactions_with_counts.map{
case Row(user_id: Int, category_id: Int, rating: Long) =>
Rating(user_id, category_id, rating)
}

2.Typed get* methods like getInt, getLong:

transactions_with_counts.map(
r => Rating(r.getInt(0), r.getInt(1), r.getLong(2))
)

3.getAs method which can use both names and indices:

transactions_with_counts.map(r => Rating(
r.getAs[Int]("user_id"), r.getAs[Int]("category_id"), r.getAs[Long](2)
))

It can be used to properly extract user defined types, including mllib.linalg.Vector. Obviously accessing by name requires a schema.

4.Converting to statically typed Dataset (Spark 1.6+ / 2.0+):

transactions_with_counts.as[(Int, Int, Long)]

Spark extracting values from a Row

Spark extracting values from a Row

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Browse Categories

Popular Courses

Top Tutorials

Top Articles

Top Interview Questions