-
Noise Contrastive Estimation and Negative SamplingData/Machine learning 2021. 3. 23. 14:31
Reference: [C. Dyer, 2014, "Notes on Noise Contrastive Estimation and Negative Sampling"]; Check out my Mendeley. Estimating the parameters of probabilistic models of language such as probabilistic neural models is computationally difficult since it involves evaluating partition functions by summing over an entire vocabulary. Two closely related strategies - noise contrastive estimation and nega..
-
Variational Auto Encoder (VAE)Data/Machine learning 2021. 3. 19. 09:55
Reference: www.jeremyjordan.me/variational-autoencoders/ Variational autoencoders. In my introductory post on autoencoders, I discussed various models (undercomplete, sparse, denoising, contractive) which take data as input and discover some latent state representation of that data. More specifically, our input data is converted into an www.jeremyjordan.me Key Concepts We define $x$, $z$ as inpu..
-
Cosine-similarity Classifier; PyTorch ImplementationData/Machine learning 2021. 3. 17. 11:10
Cosine-similarity Classifier introduced in [S. Gidaris et al., 2018] is implemented here. The cosine-similarity classifier is compared to the linear-softmax classifier. The codes can be found in my Github. [W. Chen et al., 2020] verifies the performance improvement by the cosine-similarity classifier in the few-shot learning regime. Result; In the one-shot learning regime The models are trained ..
-
Dilated Causal Convolution from WaveNetData/Machine learning 2021. 3. 1. 13:20
Concept It was first proposed from a paper for WaveNet which was developed by Google to generate realistic-sounding speech from text. You can try text2speech of the wavenet here. A comparison between with the dilated causal convolution (DCC) and without it is shown in the following figure: It can be observed that the DCC covers a longer time series, which allows the model to capture the global e..
-
[PyTorch] .detach()Data/Machine learning 2021. 1. 28. 16:42
Tensor가 기록을 추적하는 것을 중단하게 하려면, .detach()를 호출하여 연산 기록으로부터 분리(detach)하여 이후 연산들이 추적되는 것을 방지할 수 있습니다. (출처) Example (Source: here) modelA = nn.Linear(10, 10) modelB = nn.Linear(10, 10) modelC = nn.Linear(10, 10) x = torch.randn(1, 10) a = modelA(x) b = modelB(a.detach()) b.mean().backward() print(modelA.weight.grad) print(modelB.weight.grad) c = modelC(a) c.mean().backward() print(modelA.weight.grad) ..
-
[PyTorch] .detach() in Loss FunctionData/Machine learning 2021. 1. 28. 14:15
What happens if you put .detach() in a loss function? Like in the SimSiam algorithm: Example 1 Let's say, we have the following equations: $$ J = y_1 y_2 $$ $$ y_1 = 2 x $$ $$ y_2 = 3 x $$ Then, naturally, the derivatives of $J$ w.r.t the $x$ are: $$ J = (2x) (3x) = 6x^2 = 12x $$ However, if .detach() is applied to $y_1$, we treat $y_1$ as a constant when computing derivatives: $$ \frac{\partial..
-
The Boosting AlgorithmData/Machine learning 2021. 1. 12. 11:05
In this posting, we address the basic idea behind the various boosting algorithms. The easiest to understand is Adaboost, which proceeds as follows. (Note that the commonly used boosting algorithms are: Adaboost, gradient boosting, and stochastic gradient boosting which is the most common). Initialize $M$, the maximum number of models to be fit, and set the iteration counter $m=1$. Initialize th..
-
Distance MetricsData/Machine learning 2021. 1. 11. 17:09
Similarity (nearness) is determined using a distnace metric, which is a function that measures how far two records $(x_1, x_2, \cdots, x_p)$ and $(u_1, u_2, \cdots, u_p)$. Euclidean Distance $$ \sqrt{(x_1 - u_1)^2 + (x_2 - u_2)^2 + \cdots + (x_p - u_p)^2} $$ Manhattan Distance $$ |x_1 - u_1| + |x_2 - u_2| + \cdots + |x_p - u_p| $$ The Manhattan distance is the distance between two points travers..