Machine Learning 3 : Decision Tree

by mcmaur - 13:19

DECISION TREE

Is a flowchart structure in which each internal node represents a "test" on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label. The paths from root to leaf represent classification rules.

Parameters:

min-samples-split : controls if there is enough samples to split further.
criterion : algorithms for choosing a variable at each step that best splits the set. Different algorithms use different metrics for measuring "best".

common ones : gini, information gain (explained after)

Information gain:

It’s a value indicating how much is “useful” the feature if splitted. The algorithm tries to maximizes information gain.

INFORMATION GAIN = ENTROPY (parent) - [weighted average] ENTROPY (children)

The Entropy is a measure of impurity in data.

i -Pi log2 (Pi)

0 < entropy > 1

Pi is the fraction of examples in class i

i is the sum of all classes

If all data is from the same class → entropy = 0

If all data is evenly split between classes → entropy = 1

EXAMPLE:

FEATURE LABEL

GRADE	SPEED LIMIT	SPEED
steep	yes	slow
steep	yes	slow
flat	no	fast
steep	no	fast

In this case the information gain provided by choosing the feature “SPEED LIMIT” is higher than “GRADE”.

Sklearn:

from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier()

clf.fit(feature_training, label_training)

prediction = clf.predict(feature_test)

accuracy = clf.score(feature_test, label_test)

Tags : dev, development, learn, learning, machine, machine learning, scikit, sklearn, software, study

MauroCerbai

Machine Learning 3 : Decision Tree

0 commenti

About me

Follow Me

recent posts

Categories

Blog Archive

MauroCerbai

Machine Learning 3 : Decision Tree

You May Also Like

0 commenti

About me

Follow Me

recent posts

Categories

Blog Archive