Implementation detail: Mean Normalization

Sometimes we face users with no rating about any product. For the movies example, the user 5 doesn't rating any movies at all
So what are we going to do? In the algorithm, suppose we insert users 5
Thetas5 try to learn two features ( Not n+1, just n)
Then from the equation, we know that the cost function is nullified. Because the cost function itself is trying to learn based on the user's rating (which he doesn't rate any at all), then the cost function equals zero
Then we left of the Regularization term. And we also know the regularization trying to makes vector thetas5 to become smaller, but theta5 already zero.
In other words, we have no data at all.
If we insert theta5 which equals zero to calculation for predicting user's rating, then we have zero prediction
This means that the user rate any movies zeros. For this, we can't recommend any movies at all, because the distance is huge, and it's hard to find the relatedness about other movies
Use Mean Normalization try to address this problem

Here's what we are going to do
First, we are going to compute mean for each of row in matrix Y
That is, we trying to average the rating of each movie
Question mark not count! So total of average minus question mark
Then, substract each of row element with mean to produce new matrix Y
This way, the average of row from new matrix Y equals zero
with this, like saying "this is dataset from users", new matrix Y treated as dataset for Collaborative Filtering algorithm(instead of original matrix Y)
then we add mean i for every movie i that user j rated, just like calculation above(prediction ratings)
This way we try to mean normalize the rating of all movies
Specifically for user 5, then prediction rating equals zero (because theta5 equals zero) but we're adding mean normalization i
So even if Eve doesn't rate any movies at all, she left of the prediction of all the movie
So this way we try to normalize the column equals zero
From this formula, theoretically we can also do the same thing to the mean row matrix Y, that is trying to predict the rating of movie that never been rated at all
This works for some problem, but for movies examples, this might not be a good case. Why we are trying to recommend movies that never been rated of any users? We can just drop the movies. If it good, then at the very least, it should get rating.
Don't recommend movies to users with no rating, this also have to keep in mind for similar problem
Taking the focus on user with don't rate at all is more important with movies that has no rating at all

RS doesn't divided by m.

In Summary, we can use Mean Normalization for pre-processing to make Collaboration Filtering Algorithm more powerful