Machine Learning 1 : NAIVE BAYES
Are a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong independent assumptions between the feature.
Theorem:
P(A|B) = P(B|A) P(A) / P(B)
In a hospital the probability of a liver disease is 10%, the probability of the patient being alcoholic is 5% and among those with a liver disease there are a 7% that are alcoholic. What is the probability of a liver disease if the patient is alcoholic?
P(L) = 0,1
P(L) = 0,1
P(A) = 0,05
P(A|L) = 0,07
P(L|A) = P(A|L)*P(L) / P(A) = 0.07*0.01 / 0.05 = 0.14 => 14%
It’s a popular method for text categorization with word frequencies as the features but not their order, it assume that the value of a particular feature is independent of the value of any other feature. Basically it count the occurrences of a word in a particular text sample and assign a probability to that, when you need to attribute a particular “email” to someone then it compare the probability of every word of being written by a certain person.
SENDER : CHRIS - Love 0.1 - Deal 0.8 - Life 0.1
SENDER : SARA - Love 0.5 - Deal 0.2 - Life 0.3
P(CHRIS) = 0.5 = P(SARA)
TEXT : Love deal
P(CHRIS) = 0.1*0.8*0.5 = 0.04 -> 0.04/0.09 = 44%
P(SARA) = 0.5*0.2*0.5 = 0.05 -> 0.05/0.09 = 55%
Sklearn:
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(feature_training, label_training)
prediction = clf.predict(feature_test)
accuracy = clf.score(feature_test, label_test)

0 commenti