Machine Learning 10 : Principal Component Analysis

by mcmaur - 13:00

PRINCIPAL COMPONENT ANALYSIS

PCA find a new coordinates system that is detained from the old one by translation and rotation only centering the data. The goal is to try making a composite feature that more directly probes the underlying phenomenon ( square footage + number of rooms → size ).

How to determine the pca:

The pca of a dataset is the direction that has the largest variance (variance = spread of data distribution) because it retains the maximum amount of original information. That is true because projecting the original data on the longer axis of the new coordinate system we can have a more spread data value and lose the minimum amount of information possible.

When to use:

access latent feature
dimensionality reduction

visualize high dimensional data
reduce noise
use as preprocessing (reducing input for later algo [ eigenfaces] )

Sklearn:

from sklearn.decomposition import PCA

pca = PCA(n_components=2)

pcs.fit(data)

print pca.explained_variance_ratio_

first_pc = pca.components_[0]

second_pc = pca.components_[1]

x_train_pca = pcs.transform(X_test)

Tags : dev, development, job, learn, learning, machine, machine learning, scikit, sklearn, software, study, udacity

MauroCerbai

Machine Learning 10 : Principal Component Analysis

0 commenti

About me

Follow Me

recent posts

Categories

Blog Archive

MauroCerbai

Machine Learning 10 : Principal Component Analysis

You May Also Like

0 commenti

About me

Follow Me

recent posts

Categories

Blog Archive