Machine Learning 6 : Clustering

by - 14:30

CLUSTERING
186px-SLINK-density-data.svg.png
Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).


It’s a unsupervised learning because the answer is not provided, there is not a label associated with the data indicating the correct answer



K-Means algorithm:
It can be achieved by various algorithms. The most basic and most used is k-means clustering.
Given an initial set of k cluster center randomly chosen, the algorithm proceeds by alternating between two steps:
  • Assignment step: Assign each data points to the "nearest" cluster center (the one with the least squared Euclidean distance)
  • Update step: For each cluster center calculate the new position in order to minimize the total distance between itself and the data points (minimize total quadratic distances)
The algorithm has converged when the assignments no longer change. There is no guarantee that the optimum is found, it does however find a local optimum, and is commonly run multiple times with different random initializations as choosing the best of multiple runs.


Sklearn:
from sklearn.cluster import KMeans
clf = KMeans(n_cluster=8, [ number of cluster]
n_init=10, [repeat the algorithm with different cluster center on initial step]
max_iter=300) [maximum number of iteration]
clf.fit(data)
kmeans.predict(test)

kmeans.cluster_centers_

You May Also Like

0 commenti