# Embeddings in Keras: Train vs. Pretrained

Most of the state-of-the-art NLP applications — e.g. machine translation and summarization — are now based on recurrent neural networks (RNNs). And more often than not, we'll need to choose a word representation before hand.

Here are two ways of creating word representations:

1. One-hot Encoding: A simple method is to represent each word using a one-hot vector. Suppose your vocabulary contains 50K words, then the nth word would be represented as a 50K-dimensional vector, full of 0s except for a 1 at the nth position. However, with such a large vocabulary of 50K words, this sparse representation is very inefficient.

2. Word Embeddings (⭐️): Ideally, you'd want similar words to have similar representations, making it easy for the model to generalize what it learns about a word to all similar words. For example, the representation for "car" should be more similar to "lorry" than, say, "pasta". This is the idea behind word embeddings.

In a Nutshell:

• Word embeddings provide a dense representation of words and their relative meanings.

• They are an improvement over sparse representations used in simpler bag of word model representations.

• Word embeddings can be learned from text data and reused among projects. They can also be learned as part of fitting a neural network on text data.

Let's explore two different ways to add an embedding layer in Keras:

1. Train your own embedding layer
2. Use a pretrained embedding (like GloVe)

#### Import Dependencies and Load Toy Data

import re
import numpy as np
from keras.preprocessing.text import one_hot
from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense

# Define documents
docs = ['Well done!', 'Good work', 'Great effort', 'nice work', 'Excellent!',
'Weak', 'Poor effort!', 'not good', 'poor work', 'Could have done better.']

# Define class labels
labels = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]


### 1. Train Your Own Embedding (oe) Layer

Note that we're using a Keras Sequential Model here to do the job.

One-hot encode the documents in docs:

own_embedding_vocab_size = 10
encoded_docs_oe = [one_hot(d, own_embedding_vocab_size) for d in docs]
print(encoded_docs_oe)


Output:

[[2, 6], [5, 9], [2, 9], [4, 9], [2], [7], [2, 9], [3, 5], [2, 9], [5, 9, 6, 3]]


Pad each document to ensure they are of the same length:

maxlen = 5


Output:

[[2 6 0 0 0]
[5 9 0 0 0]
[2 9 0 0 0]
[4 9 0 0 0]
[2 0 0 0 0]
[7 0 0 0 0]
[2 9 0 0 0]
[3 5 0 0 0]
[2 9 0 0 0]
[5 9 6 3 0]]


Define the model:

model = Sequential()
output_dim=32,
input_length=maxlen))


Compile and train the model:

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])  # Compile the model
print(model.summary())  # Summarize the model
model.fit(padded_docs_oe, labels, epochs=50, verbose=0)  # Fit the model
loss, accuracy = model.evaluate(padded_docs_oe, labels, verbose=0)  # Evaluate the model
print('Accuracy: %0.3f' % accuracy)

> _________________________________________________________________
> Layer (type)                 Output Shape              Param #
> =================================================================
> embedding_1 (Embedding)      (None, 5, 32)             320
> _________________________________________________________________
> flatten_1 (Flatten)          (None, 160)               0
> _________________________________________________________________
> dense_1 (Dense)              (None, 1)                 161
> =================================================================
> Total params: 481
> Trainable params: 481
> Non-trainable params: 0
> _________________________________________________________________
> None
> Accuracy: 0.800


### 2. Use a Pretrained GloVe Embedding (ge) Layer

Note that we're using a Keras Functional Model here to do the job.

(⭐️) Download and use the load_glove_embeddings() function:

from load_glove_embeddings import load_glove_embeddings



One-hot encode the documents in docs with our special custom_tokenize() function, which requires the word2index variable from the previous step:

def custom_tokenize(docs):
output_matrix = []
for d in docs:
indices = []
for w in d.split():
indices.append(word2index[re.sub(r'[^\w\s]','',w).lower()])
output_matrix.append(indices)
return output_matrix

# Encode docs with our special "custom_tokenize" function
encoded_docs_ge = custom_tokenize(docs)
print(encoded_docs_ge)


Output:

[[143, 751], [219, 161], [353, 968], [3082, 161], [4345], [2690], [992, 968], [36, 219], [992, 161], [94, 33, 751, 439]]


Pad each document to ensure they are of the same length:

# Pad documents to a max length of 5 words
maxlen = 5


Output:

[[ 143  751    0    0    0]
[ 219  161    0    0    0]
[ 353  968    0    0    0]
[3082  161    0    0    0]
[4345    0    0    0    0]
[2690    0    0    0    0]
[ 992  968    0    0    0]
[  36  219    0    0    0]
[ 992  161    0    0    0]
[  94   33  751  439    0]]


Define the model (note that the embedding_matrix variable is required here):

from keras.models import Model
from keras.layers import Input

embedding_layer = Embedding(input_dim=embedding_matrix.shape[0],
output_dim=embedding_matrix.shape[1],
input_length=maxlen,
weights=[embedding_matrix],
trainable=False,
name='embedding_layer')

i = Input(shape=(maxlen,), dtype='int32', name='main_input')
x = embedding_layer(i)
x = Flatten()(x)
o = Dense(1, activation='sigmoid')(x)

model = Model(inputs=i, outputs=o)


Compile and train the model:

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['acc'])  # Compile the model
print(model.summary())  # Summarize the model
model.fit(padded_docs_ge, labels, epochs=50, verbose=0)  # Fit the model
loss, accuracy = model.evaluate(padded_docs_ge, labels, verbose=0)  # Evaluate the model
print('Accuracy: %0.3f' % accuracy)

> _________________________________________________________________
> Layer (type)                 Output Shape              Param #
> =================================================================
> main_input (InputLayer)      (None, 5)                 0
> _________________________________________________________________
> embedding_layer (Embedding)  (None, 5, 50)             20000050
> _________________________________________________________________
> flatten_2 (Flatten)          (None, 250)               0
> _________________________________________________________________
> dense_2 (Dense)              (None, 1)                 251
> =================================================================
> Total params: 20,000,301
> Trainable params: 251
> Non-trainable params: 20,000,050
> _________________________________________________________________
> None
> Accuracy: 1.000


### In a Nutshell

Here's the main difference:

1. Own Embedding:
embedding_layer_1 = Embedding(input_dim=own_embedding_vocab_size,
output_dim=32,
input_length=maxlen)

1. Pretrained Embedding (requires embedding_matrix):
embedding_layer_2 = Embedding(input_dim=embedding_matrix.shape[0],
output_dim=embedding_matrix.shape[1],
input_length=maxlen,
weights=[embedding_matrix],
trainable=False)


If you enjoyed this post and want to buy me a cup of coffee...

The thing is, I'll always accept a cup of coffee. So feel free to buy me one.

Cheers! ☕️

#### Jovian Lin, Ph.D.

A Singaporean with a fiery passion in solving real-life problems with machine learning and intelligent hacks.