Machine Learning 10 : Principal Component Analysis

by - 13:00

PRINCIPAL COMPONENT ANALYSISGaussianScatterPCA.jpg


PCA find a new coordinates system that is detained from the old one by translation and rotation only centering the data. The goal is to try making a composite feature that more directly probes the underlying phenomenon ( square footage + number of rooms → size ).


How to determine the pca:
The pca of a dataset is the direction that has the largest variance (variance = spread of data distribution) because it retains the maximum amount of original information. That is true because projecting the original data on the longer axis of the new coordinate system we can have a more spread data value and lose the minimum amount of information possible.projection.png


When to use:
  • access latent feature
  • dimensionality reduction
    • visualize high dimensional data
    • reduce noise
    • use as preprocessing (reducing input for later algo [ eigenfaces] )


Sklearn:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pcs.fit(data)
print pca.explained_variance_ratio_
first_pc = pca.components_[0]
second_pc = pca.components_[1]
x_train_pca = pcs.transform(X_test)

You May Also Like

0 commenti