0 votes
1 view
in Machine Learning by (4.8k points)

I was looking at the example of Spark site for Word2Vec:

val input = sc.textFile("text8").map(line => line.split(" ").toSeq)

val word2vec = new Word2Vec()

val model = word2vec.fit(input)

val synonyms = model.findSynonyms("country name here", 40)

How do I do the interesting vector such as king - man + woman = queen. I can use model.getVectors, but not sure how to proceed further.

1 Answer

+1 vote
by (7.9k points)

Here is an example in pyspark, which I guess is straightforward to port to Scala - the key is the use of model.transform.

from pyspark import SparkContext

from pyspark.mllib.feature import Word2Vec

sc = SparkContext()

inp = sc.textFile("text8_lines").map(lambda row: row.split(" "))

k = 200         # vector dimensionality

word2vec = Word2Vec().setVectorSize(k)

model = word2vec.fit(inp)

k is the dimensionality of the word vectors - the higher the better (default value is 100), but you will need memory, and the highest number I could go with my machine was 200.

...