K-means algorithm

2014-05-06 13:29 | Source

K-means algorithm

Clustering Algorithm is used to classify the data structure that not labeled at the beginning.
K-means is one example of Clustering, and one of the most powerful and famous example of Clustering Algorithm to date.

First step(Cluster Assignment Step), build n number of cluster centroids(number of cluster that we want)
Color which one is closer between cluster centroids

Second step(Move Centroids Step), take average/mean of the distance of all the examples that has the same color as the centroid has, and move it to the result point
Go back to Cluster Assignment Step, and recolor it once again
Again, continue back to Move Centroid Step,
Do it again to Cluster Assignment Step
Iterate it over and over again until the cluster centroids don't move any further

First initialize all the cluster randomly
Then create some vector represent examples that the ith element is assigned with the index number of cluster that closest to it.(find the shortest length)
Then for every cluster, get all the vector assigned earlier, take average distance(mean) produced midpoint among same index, and move the cluster relative to the index

x is m-dimensional vector
if sometimes there's one cluster centroid that has no point, best to eliminate it. Then it resulting to K-1 init, or reiinitialized it, although the first one is more wisely

how about non-separated clusters?
Suppose we collected data from people that has correlated data between their weight and height
How big the radius of clusters for each S,M,L?
K-means still separate the data
Now this where the Market Segmentation can be benefit from. From three separate group of people, we can design S to match S-group of people, and also for the case M for M-group and L for L-group. This way the company can build product, and segment its product for each specific groups that they decide.

So we can implemented the K-means algorithm by now, but still discussed more deeply in the next video