NLP and Word Embeddings

ke, 30 July 2018

[ deep_learning  ]
  • Word representation

Featurized representation: word enbedding

Visualizing word embeddings:

  • Using word embeddings
  1. Learn word embeddings from large text corpus.(1-100B words)
    (Or download pre-trained embedding online.)

  2. Transfer embedding to new task with smaller training set.
    (say, 100k words)

  3. Optional: Continue to finetune the word embeddings with new data.

  • Properties of word embeddings

Cosine similarity:
sim(ew, eking - eman + ewoman)
sim(u,v) = (u.T * v) / ||v||2 * ||v||2

  • Embedding matrix

  • Learning word embeddings


The softmax output the probability of juice, p(juice)

  • Word2Vec

Skip-grams

  • Negative sampling

  • GloVe(global vectors for word representation)

i want a glass of orange juice to go alone with my cereal.

c, t
Xij = #times i(t) appears in content of j(c)

  • Sentiment classification

sentiment classification problem:

simple sentiment classification model

RNN for sentiment classification

  • Debiasing word embeddings