-
[PyTorch] .detach()Data/Machine learning 2021. 1. 28. 16:42
Tensor가 기록을 추적하는 것을 중단하게 하려면, .detach()를 호출하여 연산 기록으로부터 분리(detach)하여 이후 연산들이 추적되는 것을 방지할 수 있습니다. (출처) Example (Source: here) modelA = nn.Linear(10, 10) modelB = nn.Linear(10, 10) modelC = nn.Linear(10, 10) x = torch.randn(1, 10) a = modelA(x) b = modelB(a.detach()) b.mean().backward() print(modelA.weight.grad) print(modelB.weight.grad) c = modelC(a) c.mean().backward() print(modelA.weight.grad) ..
-
[PyTorch] .detach() in Loss FunctionData/Machine learning 2021. 1. 28. 14:15
What happens if you put .detach() in a loss function? Like in the SimSiam algorithm: Example 1 Let's say, we have the following equations: $$ J = y_1 y_2 $$ $$ y_1 = 2 x $$ $$ y_2 = 3 x $$ Then, naturally, the derivatives of $J$ w.r.t the $x$ are: $$ J = (2x) (3x) = 6x^2 = 12x $$ However, if .detach() is applied to $y_1$, we treat $y_1$ as a constant when computing derivatives: $$ \frac{\partial..
-
The Boosting AlgorithmData/Machine learning 2021. 1. 12. 11:05
In this posting, we address the basic idea behind the various boosting algorithms. The easiest to understand is Adaboost, which proceeds as follows. (Note that the commonly used boosting algorithms are: Adaboost, gradient boosting, and stochastic gradient boosting which is the most common). Initialize $M$, the maximum number of models to be fit, and set the iteration counter $m=1$. Initialize th..
-
Distance MetricsData/Machine learning 2021. 1. 11. 17:09
Similarity (nearness) is determined using a distnace metric, which is a function that measures how far two records $(x_1, x_2, \cdots, x_p)$ and $(u_1, u_2, \cdots, u_p)$. Euclidean Distance $$ \sqrt{(x_1 - u_1)^2 + (x_2 - u_2)^2 + \cdots + (x_p - u_p)^2} $$ Manhattan Distance $$ |x_1 - u_1| + |x_2 - u_2| + \cdots + |x_p - u_p| $$ The Manhattan distance is the distance between two points travers..
-
Problems with Clustering Mixed DataData/Machine learning 2021. 1. 10. 08:26
Mixed data: data where a numeric variable and a categorical variable coexist. Problem Statement K-menas and PCA are most appropirate for continuous variables. For smaller datasets, it is better to use hierarchical clustering with Gower's distance. In principle there is no reason why K-menas can't be applied to binary or categorical data. You would usually use the "one hot encoder" representation..
-
Gower's DistanceData/Machine learning 2021. 1. 8. 17:16
It measures the dissimilarity between records within a mixed dataset. The mixed dataset refers to a dataset where both a numeric feature (variable) and a categorical feature (variable) exist together. The basic idea behind Gower's distance is to apply a different distance metric to each variable depending on the type of data: For numeric variables and ordered factors (ordered categorical variabl..
-
Outlier Detection with Multivariate Normal DistributionData/Machine learning 2021. 1. 8. 13:28
Let's say, we have a dataset with two random variables that follow the normal distribution: Then, we can build a 2-dimensional normal distribution that fits the above dataset: The probability contour would look like that: Since we usually consider data outside of the 95% confidence level as an outlier, we can set the boundary between non-outliers and outliers as an ellipse with the probability o..
-
Hierarchical Clustering (Agglomerative Algorithm)Data/Machine learning 2021. 1. 7. 12:05
Hierarchical clustering provides an intuitive graphical display (e.g. dendrogram). However, it cannot be used with a large dataset. For even a modest-sized dataset with just tens of thousands of records, it can require intensive computing resources. In fact, most of the applications of hierarchical clustering are focused on relatively small datasets. Algorithm (Hierarchical Clustering = Agglomer..