# Choosing the Number of Principal Components

|   Source
Choosing the Number of Principal Components

• n dimensional features in large data scale usually have arround thousand number that most of them are highly correlated
• This lesson will show how to reduce for example thousand features(n) to hundred features(k) effectively while still retain most of the data

• For choosing k, we at least have to take some measurement, in value, how many % we retain the original data
• Usually we set about 99%, 95%, 90% . The sentence that preferred to use is "% of variance is retained".
• Lot more simpler than saying that "we choose k unit, because..... and .... and the result of error is ........"
• typical common use is >= 95% data
• so % retained is Average squared projection error/Total variation in the data
• Check to see if k = 1 is satisfied the requirement. If it don't then increase k by one every iteration
• Really inefficient if we use the computation for every iteration i
• matrix S in USV is  diagonal matrix (other zero) that we can compute every iteration(much more simple)
• Remember that k in PCA is a quantity value of the data that retained, and based on k, how much % variance that was retained, in other words, how much % of the data that retained based on the original
• Alternatively, we can init k = 1;
• Then for k, Compute like the graph on the right,  choose the smallest value so that "99% variance retained" satisfied
• If we try to manually set k, then try to provide some back-up by formula that given above.
• That way others see your recommendation and approve it.
• PCA try to minimize the error for the projection line between original data (x) and data projection (x.approx)

• SUMMARY
• thousand dimensional data and highly correlated, using PCA will reduce/compressed data by very large vector while retain most of the data