Advice for Applying PCA

2018-04-05 00:00 | Source

We know by now that PCA can reduce the data while representing original data, thus making faster learning algorithm.
Still, choose where to apply PCA wisely, not all area can be benefit from PCA

We can use PCA for speedup our supervised learning algorithm.
For this particular example, let's say we want to reduce the dimension of the data, down to one tenth of the original
Extract only x-value from the original, so it become unlabeled dataset
Then apply the new x-index(z-value) to coressponding y-value so it become the new training set
Replace x with z, and run the hyphothesis.
As warning above, use PCA Ureduce only for training set. Then use same mapping to cv and test set.
For many problem max is 1/10 for keeping most retain

y-value doesn't incorporated for PCA to take into account.
Thus give a bad compression for representing original data that should take y-value as an input, resulting in throwing away some valuable information
Regularization would works just fine, less likely throw away some valuable information for logistic regression or neural networks

Adding PCA is a more complicated
If data too large, or anything else, ex doesn't work. then use PCA. But not recommended to use PCA as a basic plan/first plan. Only if things doesn't work. PCA is a little complicated that shouldn't be at first plan for reducing the data