Machine Learning 9 : Feature Selection

by - 15:00

FEATURE SELECTION
Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for four reasons:
  • simplification of models to make them easier to interpret by researchers/users
  • shorter training times
  • to avoid the curse of dimensionality
  • enhanced generalization by reducing overfitting


Add a new feature (feature extraction):
Feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning and generalization steps, and in some cases leading to better human interpretations.
  • Use human intuition
  • Code the new feature
  • Visualize
  • Repeat


Getting rid of a feature (feature selection):imgg.png
There is an optimal number of feature that balances the bias and variance. So the process to find this point is called regularization.
There are two big univariate feature selection tools in sklearn: SelectPercentile and SelectKBest. The difference is pretty apparent by the names: SelectPercentile selects the X% of features that are most powerful (where X is a parameter) and SelectKBest selects the K features that are most powerful (where K is a parameter).


Lasso Regression:
One of this methods is Lasso regression that it introduces a penalty parameter for the number of feature, like this:


minimum SSE + |B|
So the formula implies that we find the perfect balance between the minimum sum of squared errors and the number of feature.
What also does is find the best feature because every “y” feature has a “m” coefficient so if you order your because by the m value you get a list of the most important feature.


Sklearn:
from sklearn.linea_model import Lasso
regression = Lasso()
regression.fit(features, labels)
regression.predict([2,4])

print regression.coef_

You May Also Like

0 commenti