-
Choose which features to include
-
Choose the right value for lambda
-
These are what discussed in "mode selection"
-
Overfitting really not a good example for predicting the future examples
-
Suppose we choosing which parameters and which degree of polynomials
-
Here's what it works
-
it could determine how well and generalized the value of d
-
test each d by using jtest(theta) , inject the parameters into the Jtest
-
choosing whichever has the lowest error value
-
Still not a good point for generaliz e the parameters
-
Perhaps is the best, but not always the generalize. May tends to be overfitting.
-
To address the problem, instead of splitting into two training and test, split into three
-
the final is in right corner set is Mtest
-
Use CV to set the model of the test
-
These (d = 1) is the linear model
-
First, extract the parameters, and test that into cross validation (cv) set
-
The lowest cross validation error (this case, d= 4) then use this degree model to the test set
-
Then use the result of error test to the generalization model that we selected
-
Orr, use the model picked to the test set, and estimate generalization error.
-
Jcv will generally be slower than Jtest, because we get parameters from training set, and select the model of parameters from cross-validation.
-
We choose the lowest error of Jcv, from Xcv & Ycv, so that's why it fit cross-validation, more than test set, which in turn lower cost than test set.
-
Most machine learning practioners use just training set and test set. Its better to have 3 separate set, that is training set, cross validation set, and test set