Here the major difference is:
An embedding layer performs the select operation. In keras, the same layer is equivalent to:
A dense layer performs the dot-product operation, plus an optional activation:
outputs = matmul(inputs, self.kernel)
outputs = bias_add(outputs, self.bias)
You can imitate an embedding layer with a fully-connected layer via one-hot encoding, but the whole point of dense embedding is to avoid one-hot representation. Using NLP, the word vocabulary size can be of the order 100k. On top of that, it's often needed to process the sequences of words in a batch. Processing the batch of sequences of word indices would be much more efficient than the batch of sequences of one-hot vectors. In addition, gather operation itself is faster than matrix dot-product, both in forwarding and backward pass.
One can always study Neural Network Tutorial to gain more insights on Embedding Layer and Dense Layer. To figure out the difference between the two, one needs to study Artificial Intelligence as well.