• By now, we already have the formula without mj
  • So given each user, we also may want to know how much the parameters prediction for all users, and try to minimize he cost function
  • We also have different prediction for making prediction of all users rating.
  • What explained above, is we just added the summation for all users in the cost function as well as in the regularization

  • For Gradient descent update, depending of whether k = 0, we also have different formula
  • in blue gathered is the partial derrivation, and use it to minimize cost ¬†function in Gradient Descent
  • Also can use other advanced optimization to try to minimum cost function J as swell