Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

The docs for an Embedding Layer in Keras say:

Turns positive integers (indexes) into dense vectors of fixed size. eg.

[[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

I believe this could also be achieved by encoding the inputs as one-hot vectors of length vocabulary_size and feeding them into a Dense Layer.

Is an Embedding Layer merely a convenience for this two-step process, or is something fancier going on under the hood?

1 Answer

0 votes
by (33.1k points)

Here the major difference is:

An embedding layer performs the select operation. In keras, the same layer is equivalent to:

K.gather(self.embeddings, inputs)      

A dense layer performs the dot-product operation, plus an optional activation:

outputs = matmul(inputs, self.kernel)  

outputs = bias_add(outputs, self.bias) 

return self.activation(outputs)        

You can imitate an embedding layer with a fully-connected layer via one-hot encoding, but the whole point of dense embedding is to avoid one-hot representation. Using NLP, the word vocabulary size can be of the order 100k. On top of that, it's often needed to process the sequences of words in a batch. Processing the batch of sequences of word indices would be much more efficient than the batch of sequences of one-hot vectors. In addition, gather operation itself is faster than matrix dot-product, both in forwarding and backward pass.

One can always study Neural Network Tutorial to gain more insights on Embedding Layer and Dense Layer. To figure out the difference between the two, one needs to study Artificial Intelligence as well.

Browse Categories