Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Machine Learning by (19k points)

Using Spark ML transformers I arrived at a DataFrame where each row looks like this:

Row(object_id, text_features_vector, color_features, type_features)

where text_features is a sparse vector of term weights, color_features is a small 20-element (one-hot-encoder) dense vector of colors, and type_features is also a one-hot-encoder dense vector of types.

What would a good approach be (using Spark's facilities) to merge these features in one single, large array, so that I measure things like the cosine distance between any two objects?

1 Answer

0 votes
by (33.1k points)

You should simply use VectorAssembler.

For example:

import org.apache.spark.ml.feature.VectorAssembler

import org.apache.spark.sql.DataFrame

val df: DataFrame = ???

val assembler = new VectorAssembler()

  .setInputCols(Array("text_features", "color_features", "type_features"))

  .setOutputCol("features")

val transformed = assembler.transform(df)

For more details on Vector Assembler, study Spark Tutorial.

Hope this answer helps you!

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94.1k users

Browse Categories

...