Log-bilinear Language Model

Data/Machine learning 2021. 4. 20. 17:09

It computes the probability of a next word $w_i$ given the previous words (context) as follows:

$$ P(w_i = w | w_{i-1}, ..., w_1) = \frac{ \exp\{ \phi(w)^T c \} }{ \sum_{w^\prime \in V} \exp\{ \phi(w^\prime)^T c \} } $$

Here $\phi (w)$ is a word-vector and $c$ is the context for $w_i$ computed as

$$ c = \sum_{n=1}^{i-1} \alpha_n \phi(w_n) $$

Thus, the log-bilinear language model computes a context vector as a linear combination of the previous word vectors. And, a distribution of the next word $w_i$ is computed based on similarity between the word embedding $\phi(w)$ and the context, by taking a softmax over the vocabulary $V$.

Application

This is used in the CPC (Contrastive Predictive Coding) paper.

Reference

[1] Quora, www.quora.com/What-is-a-log-bilinear-model

저작자표시

'Data > Machine learning' 카테고리의 다른 글

PyTorch Example of LSTM (0)	2021.04.21
Visualization of CNN (0)	2021.04.15
What does .eval() do in PyTorch? (0)	2021.04.14

Comments

ABOUT ME

Daesoo Lee's Blog

Application

Reference

'Data > Machine learning' 카테고리의 다른 글

티스토리툴바