Choosing the Number of Principal Components

  |   Source
Choosing the Number of Principal Components

  • n dimensional features in large data scale usually have arround thousand number that most of them are highly correlated
  • This lesson will show how to reduce for example thousand features(n) to hundred features(k) effectively while still retain most of the data


  • For choosing k, we at least have to take some measurement, in value, how many % we retain the original data
  • Usually we set about 99%, 95%, 90% . The sentence that preferred to use is "% of variance is retained".
  • Lot more simpler than saying that "we choose k unit, because..... and .... and the result of error is ........"
  • typical common use is >= 95% data
  • so % retained is Average squared projection error/Total variation in the data
  • Check to see if k = 1 is satisfied the requirement. If it don't then increase k by one every iteration
  • Really inefficient if we use the computation for every iteration i
  • matrix S in USV is  diagonal matrix (other zero) that we can compute every iteration(much more simple)
  • Remember that k in PCA is a quantity value of the data that retained, and based on k, how much % variance that was retained, in other words, how much % of the data that retained based on the original
  • Alternatively, we can init k = 1;
  • Then for k, Compute like the graph on the right,  choose the smallest value so that "99% variance retained" satisfied
  • If we try to manually set k, then try to provide some back-up by formula that given above.
  • That way others see your recommendation and approve it.
  • PCA try to minimize the error for the projection line between original data (x) and data projection (x.approx)


  • SUMMARY
  • thousand dimensional data and highly correlated, using PCA will reduce/compressed data by very large vector while retain most of the data