-
Log-bilinear Language ModelData/Machine learning 2021. 4. 20. 17:09
It computes the probability of a next word $w_i$ given the previous words (context) as follows:
$$ P(w_i = w | w_{i-1}, ..., w_1) = \frac{ \exp\{ \phi(w)^T c \} }{ \sum_{w^\prime \in V} \exp\{ \phi(w^\prime)^T c \} } $$
Here $\phi (w)$ is a word-vector and $c$ is the context for $w_i$ computed as
$$ c = \sum_{n=1}^{i-1} \alpha_n \phi(w_n) $$
Thus, the log-bilinear language model computes a context vector as a linear combination of the previous word vectors. And, a distribution of the next word $w_i$ is computed based on similarity between the word embedding $\phi(w)$ and the context, by taking a softmax over the vocabulary $V$.
Application
This is used in the CPC (Contrastive Predictive Coding) paper.
Reference
[1] Quora, www.quora.com/What-is-a-log-bilinear-model
'Data > Machine learning' 카테고리의 다른 글
PyTorch Example of LSTM (0) 2021.04.21 Visualization of CNN (0) 2021.04.15 What does .eval() do in PyTorch? (0) 2021.04.14