Machine Learning 11 : Cross Validation

by mcmaur - 14:00

CROSS VALIDATION

One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set).

The conventional validation works partitioning the data set into two sets of 70% for training and 30% for test for example.

Sklearn:

from sklearn import cross_validation

feature_train, feature_test, label_train, label_test = cross_validation.train_test_split (iris_data, iris_target, test_size=0.4, random_state=0)

[train]

pca.fit (feature_train)

pca.transform(feature_train)

svc.train(feature_train)

[test]

NO FIT (you want to use the same function as in the training)

pca.transform(feature_test)

svc.train(feature_test)

K-Fold:

In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data.

So, we can explain also like this:

repat k times

pick 1 block of data as test
train against the othe k-1 block
test on testing set

average final result

Sklearn:

from sklearn.cross_validation import KFold

kf = KFold(len(authors), 2)

for train_indices, test_indices in kf:

feature_train = [word_data[ii] for ii in train_indices]

feature_test = [word_data[ii] for ii in test_indices]

authors_train = [authors[ii] for ii in train_indices]

authors_train = [authors[ii] for ii in test_indices]

GridSearchCV:

Parameter tuning is the process of selecting the values for a model's parameters that maximize the accuracy of the model.

Scikit-learn provides an object that, given data, computes the score during the fit of an estimator on a parameter grid and chooses the parameters to maximize the cross-validation score.

By default, the GridSearchCV's cross validation uses 3-fold KFold or StratifiedKFold depending on the situation.

Sklearn:

parameters = { ‘kernel’: (‘linear’, ‘rbf’), C [1, 10])

svr = svm.SVC

clf = grid_search.GridSearchCV(svr, parameters)

clf.fit(iris_data, iris_target)

print clf.best_params_

Tags : dev, development, job, learn, learning, machine, machine learning, scikit, sklearn, software, study, udacity

MauroCerbai

Machine Learning 11 : Cross Validation

0 commenti

About me

Follow Me

recent posts

Categories

Blog Archive

MauroCerbai

Machine Learning 11 : Cross Validation

You May Also Like

0 commenti

About me

Follow Me

recent posts

Categories

Blog Archive