
Clustering Algorithm is used to classify the data structure that not labeled at the beginning.

Kmeans is one example of Clustering, and one of the most powerful and famous example of Clustering Algorithm to date.

Two step in Kmeans

Cluster Assignment Step

Move Centroid Step

First step(Cluster Assignment Step), build n number of cluster centroids(number of cluster that we want)

Color which one is closer between cluster centroids

Second step(Move Centroids Step), take average/mean of the distance of all the examples that has the same color as the centroid has, and move it to the result point

Go back to Cluster Assignment Step, and recolor it once again

Again, continue back to Move Centroid Step,

Do it again to Cluster Assignment Step

Iterate it over and over again until the cluster centroids don't move any further

These are the mathematical step of Kmeans algorithm

First let manually set K

Remember that y value is not needed anymore

without x0 convention

First initialize all the cluster randomly

Then create some vector represent examplesÂ that the ith element is assigned with the index number of cluster that closest to it.(find the shortest length)

Then for every cluster, get all the vector assigned earlier, take average distance(mean) produced midpoint among same index, and move the cluster relative to the index

upper K denotes max number of K

lower k is index of cluster centroids

x is mdimensional vector

if sometimes there's one cluster centroid that has no point, best to eliminate it. Then it resulting to K1 init, or reiinitialized it, although the first one is more wisely

how about nonseparated clusters?

Suppose we collected data from people that has correlated data between their weight and height

How big the radius of clusters for each S,M,L?

Kmeans still separate the data

Now this where the Market Segmentation can be benefit from. From three separate group of people, we can design S to match Sgroup of people, and also for the case M for Mgroup and L for Lgroup. This way the company can build product, and segment its product for each specific groups that they decide.

So we can implemented the Kmeans algorithm by now, but still discussed more deeply in the next video