'Computer Science/Machine learning' 카테고리의 글 목록

TBD Last update : 2024/03/30

2024.03.30

Generative ModelGiven training dataset, we want to generate new samples from the same distribution. 즉 Generative model은 density of the data를 추정하는 것이 목표이다.💡learn pmodel(x)p_{\text{model}}(x)pmodel(x) that approximates pdata(x)p_{\text{data}}(x)pdata(x)Applications of Generative ModelDensity estimationData explorationAnomaly detectionImage-to-Image translationSuper-resolution : Increase the re..

14. Generative Model

Generative ModelGiven training dataset, we want to generate new samples from the same distribution. 즉 Generative model은 density of the data를 추정하는 것이 목표이다.💡learn pmodel(x)p_{\text{model}}(x)pmodel(x) that approximates pdata(x)p_{\text{data}}(x)pdata(x)Applications of Generative ModelDensity estimationData explorationAnomaly detectionImage-to-Image translationSuper-resolution : Increase the re..

2023.07.01

Curse of DimensionalityDatasets are typically high-dimensional→ 차원이 올라갈수록 영역당 observation이 줄게 된다. 또한, computational cost가 올라가게 된다. 이러한 현상을 curse of dimensionality 라고 한다.Observed Dimensionality실제로 data가 놓여있는 공간의 dimension은 그보다 더 작다.Dimensional Reduction기존 데이터들의 properties들을 보존하면서 high-dimensional space를 low-dimensional space로 내리고자 하는것→ It is commonly used for feature extraction data compression d..

13. Dimensionality Reduction

Curse of DimensionalityDatasets are typically high-dimensional→ 차원이 올라갈수록 영역당 observation이 줄게 된다. 또한, computational cost가 올라가게 된다. 이러한 현상을 curse of dimensionality 라고 한다.Observed Dimensionality실제로 data가 놓여있는 공간의 dimension은 그보다 더 작다.Dimensional Reduction기존 데이터들의 properties들을 보존하면서 high-dimensional space를 low-dimensional space로 내리고자 하는것→ It is commonly used for feature extraction data compression d..

2023.07.01

Gaussian Mixture Model (GMM)현재 주어진 label이 없는 training set {x(1),…,x(n)}\{x^{(1)}, \dots, x^{(n)}\}{x(1),…,x(n)}의 distribution을 추정하고 싶다. 이를 위해서 다음과 같은 joint distribution을 모델링한다.p(x(i),z(i))=p(x(i)∣z(i))p(z(i))p(x^{(i)}, z^{(i)}) = p(x^{(i)}|z^{(i)})p(z^{(i)})p(x(i),z(i))=p(x(i)∣z(i))p(z(i))이때, z(i)∼Multinomial(ϕ)z^{(i)} \sim \text{Multinomial}(\phi)z(i)∼Multinomial(ϕ) and x(i)∣z(i)∼N(μj,Σj)x^{(..

12. Expectation Maximization

Gaussian Mixture Model (GMM)현재 주어진 label이 없는 training set {x(1),…,x(n)}\{x^{(1)}, \dots, x^{(n)}\}{x(1),…,x(n)}의 distribution을 추정하고 싶다. 이를 위해서 다음과 같은 joint distribution을 모델링한다.p(x(i),z(i))=p(x(i)∣z(i))p(z(i))p(x^{(i)}, z^{(i)}) = p(x^{(i)}|z^{(i)})p(z^{(i)})p(x(i),z(i))=p(x(i)∣z(i))p(z(i))이때, z(i)∼Multinomial(ϕ)z^{(i)} \sim \text{Multinomial}(\phi)z(i)∼Multinomial(ϕ) and x(i)∣z(i)∼N(μj,Σj)x^{(..

2023.07.01

Unsupervised LearningClusteringK-means algorithmMixture of GaussianDimensionality reductionPrincipal component analysis (PCA)Factor analysisMixture of Factor analysisKernel PCAt-SNEGenerative modelGenerative adversarial networks (GAN)Variational Auto Encoder (VAE)Different definitions of likelihoodLet's break down the likelihood functions for generative and discriminative models in both supervis..

11. Clustering and K-means Algorithm

Unsupervised LearningClusteringK-means algorithmMixture of GaussianDimensionality reductionPrincipal component analysis (PCA)Factor analysisMixture of Factor analysisKernel PCAt-SNEGenerative modelGenerative adversarial networks (GAN)Variational Auto Encoder (VAE)Different definitions of likelihoodLet's break down the likelihood functions for generative and discriminative models in both supervis..

2023.07.01

Training vs Test ErrorsRegression problem에서는 일반적으로 loss function을 MSE로 잡는다.J(θ)=1N∑i=1N(y(i)−hθ(x(i)))2J(\theta) = \frac{1}{N}\sum_{i = 1}^N(y^{(i)}-h_\theta(x^{(i)}))^2J(θ)=N1i=1∑N(y(i)−hθ(x(i)))2하지만 test error는 위 방식과는 다름L(θ)=E(x,y)∼D[(y−hθ(x))2]L(\theta) = E_{(x, y) \sim \mathcal D}[(y-h_\theta(x))^2]L(θ)=E(x,y)∼D[(y−hθ(x))2]where a test sample (x,y)(x, y)(x,y) is sampled from the so cal..

10. Generalization and Regularization

Training vs Test ErrorsRegression problem에서는 일반적으로 loss function을 MSE로 잡는다.J(θ)=1N∑i=1N(y(i)−hθ(x(i)))2J(\theta) = \frac{1}{N}\sum_{i = 1}^N(y^{(i)}-h_\theta(x^{(i)}))^2J(θ)=N1i=1∑N(y(i)−hθ(x(i)))2하지만 test error는 위 방식과는 다름L(θ)=E(x,y)∼D[(y−hθ(x))2]L(\theta) = E_{(x, y) \sim \mathcal D}[(y-h_\theta(x))^2]L(θ)=E(x,y)∼D[(y−hθ(x))2]where a test sample (x,y)(x, y)(x,y) is sampled from the so cal..

2023.07.01

Back-propagationDownstream gradient를 upstream gradient와 local gradient를 활용해서 구할 수 있다.Computational Graph복잡한 함수를 small function으로 쪼갠다는 것이 핵심기본적으로 upstream gradient와 local gradient를 곱해서 downstream gradient를 구한다.💡local gradient를 구하는 방법은 함숫값을 이용할 수도 있고, input값을 이용할 수도 있다. 두가지 방법 모두 다 적용해서 해볼 것.추가적으로 Sigmoid 단위로 묶으면 좀 더 편할 수 있다.💡local gradient는 함숫값을 이용해서 쉽게 구할 수 있다. 여기에서는 0.73×0.270.73 \times 0.270.7..

9. Backpropagation

Back-propagationDownstream gradient를 upstream gradient와 local gradient를 활용해서 구할 수 있다.Computational Graph복잡한 함수를 small function으로 쪼갠다는 것이 핵심기본적으로 upstream gradient와 local gradient를 곱해서 downstream gradient를 구한다.💡local gradient를 구하는 방법은 함숫값을 이용할 수도 있고, input값을 이용할 수도 있다. 두가지 방법 모두 다 적용해서 해볼 것.추가적으로 Sigmoid 단위로 묶으면 좀 더 편할 수 있다.💡local gradient는 함숫값을 이용해서 쉽게 구할 수 있다. 여기에서는 0.73×0.270.73 \times 0.270.7..

2023.07.01

Linear Predictor for Binary ClassificationDecision rule can be described byg(x)=sign(f(x))={+1if wTx+b≥0−1if wTx+b 0, \forall i \in \{1, \cdots, n\}yiwTxi>0,∀i∈{1,⋯,n}💡부호가 같으면 양수가 되므로 이렇게 식을 잡은 것이다. 위 식을 perceptron criterion 이라고 부른다. (사실상 functional margin이랑 수식이 거의 비슷하다.그래서 objective function to be minimized를 다음과 같이 정의하게 된다.J(w)=−∑(xi,yi)∈M(w)yiwtxi\mathcal J(w) = -\sum_{(x_i, y_i) \in \mathca..

8. Neural Networks

Linear Predictor for Binary ClassificationDecision rule can be described byg(x)=sign(f(x))={+1if wTx+b≥0−1if wTx+b 0, \forall i \in \{1, \cdots, n\}yiwTxi>0,∀i∈{1,⋯,n}💡부호가 같으면 양수가 되므로 이렇게 식을 잡은 것이다. 위 식을 perceptron criterion 이라고 부른다. (사실상 functional margin이랑 수식이 거의 비슷하다.그래서 objective function to be minimized를 다음과 같이 정의하게 된다.J(w)=−∑(xi,yi)∈M(w)yiwtxi\mathcal J(w) = -\sum_{(x_i, y_i) \in \mathca..

2023.07.01

새소식

인기 검색어

Computer Science/Machine learning

티스토리툴바