
Inverse can't be implemented directly as X is not a squared matrix. That's way XtX to produce square matrix

So sensor could be comes from many kinds

Sensor error: The machine incorreclty sense the input and thus giving a bad input

Maliciously, data is wrong interpreted. As we want to gather big data comes from many different resources, collecting it into one database could often lead to bad data.

Transcription error: in the case of human error, collecting bad data

Unmodeled influences: we might ourself wrong to model our problem. As shown in predicting the house cost, size along couldn't measure the price of a house. Could be (location,n number bedroom, etc.)

The data that drawn from train,cv,and test set should be IID (independently identical data). And thus assume all dataset come from same normal distribution.

That Â is the fundamental assumption

We want to test our function that gather from train set to test set. Train from test set could not gain anything useful, as we want our function to generalized upcoming data.

After we build our model from train set, we want to try order of
polynomial
to see if it has better fit.

In order to use it we need more dataset. Using it on test set could once again consider cheating

What we could do is splitting our train set to give to cross validation set

Then for split our data, and try many order of
polynomial
.

From the table each attribute assigned with each order of
polynomial
.

The result then picked model with the lowest cross validation error

Each chunks of the data is test on the cross validation set.

Each chunks assigned by different order of
polynomial
, and test on the cv.

Average the error of each chunks, and pick order of
polynomial
with lowest cross validation error.

Now, the graph shown on the red, is the error rate. As we increase the order of
polynomial
, its error down with degree 3 the threshold, then same error train

Then the blue line denote by we testing the our function with our order of
polynomial
. Here we see as it decreases, the cv rise again as order of
polynomial
increases.

As we learned in machine learning, increasing the order of
polynomial
will be overmatch the existing data, try it to the cv, give us high error.Order of 3 and 4 give some generalization that can handle train set and cv set very well.

Input can be scalar/vector

To predict credit score, given job,age,asset,other continuous input. We can take it as boolean value. In hair cases, enumerate, take rgb number (higher rgb, higher the score)