The problem of overfitting algorithm
Regularization : a way to decrease overfitting problem
high bias(underfit) : misinterpreted line fit for the data
high variance(overfit) : lot of features (many high order of polynomials) but lack more data to give a good hypothesis
example for logistic regression
there is a tool for analyzing whether the algorithm has overfitting or underfitting...
a lot of features may risk a lot of high order polynomials....
making it even harder to visualize (in case of over 100 features)
first option, either manually decrease the features or automatically reduce number of features that will be discussed later in greater depth..
The disadvantage is we don't know whether the features particularly useful for latter, or even all features matter...
second, use Regularization, where keep we all the features, but reduce the magnitude of features based on their usefulness..