• These are the mathematical step of K-means algorithm
  • First let manually set K
  • Remember that y value is not needed anymore
  • without x0 convention
  • First initialize all the cluster randomly
  • Then create some vector represent examples that the ith element is assigned with the index number of cluster that closest to it.(find the shortest length)
  • Then for every cluster, get all the vector assigned earlier, take average distance(mean) produced midpoint among same index, and move the cluster relative to the index
  • upper K denotes max number of K
  • lower k is index of cluster centroids

  • x is m-dimensional vector
  • if sometimes there's one cluster centroid that has no point, best to eliminate it. Then it resulting to K-1 init, or reiinitialized it, although the first one is more wisely

  • how about non-separated clusters?
  • Suppose we collected data from people that has correlated data between their weight and height
  • How big the radius of clusters for each S,M,L?
  • K-means still separate the data
  • Now this where the Market Segmentation can be benefit from. From three separate group of people, we can design S to match S-group of people, and also for the case M for M-group and L for L-group. This way the company can build product, and segment its product for each specific groups that they decide.

  • So we can implemented the K-means algorithm by now, but still discussed more deeply in the next video