• Alternatively, we can init k = 1;
  • Then for k, Compute like the graph on the right,  choose the smallest value so that "99% variance retained" satisfied
  • If we try to manually set k, then try to provide some back-up by formula that given above.
  • That way others see your recommendation and approve it.
  • PCA try to minimize the error for the projection line between original data (x) and data projection (x.approx)


  • SUMMARY
  • thousand dimensional data and highly correlated, using PCA will reduce/compressed data by very large vector while retain most of the data