will learn what features to use

- sometimes the Recommender Systems face the product that are unrated. For the movie rating examples, sometimes there are movies that never rated by users, and Recommender Systems doesn't know how much the movie rated.
- in other words, the movies can't be rated initially. x1 = [0;?;?]
- But what it can do is, try to gather rating information from all user; and infer what is the rating for the movies that unrated/unseen by users.
- For Alice and Bob, Recommender Systems can infer from all the movies they rated, they love romantic movies and hate action movies. And so c and d hate romantic movies and but love action movies
- take x1 movies for example. It receive a and b rating 5, c and d rating 0. The Recommender Systems then know that x1 ("Love At Last" )movie is romantic movie and not action movie. And also how much it rated
- then from x1 examples, we would know rating of the rest of the movies
- Remember FEATURE RANGE FROM 0 TO 1!!!
- We shoud infer from the table that user2 and user3 like romantic movies.
- And because movie1 is rated by user2 and user3, than movie1 should be in romantic movies category
- For romantic movies, user2 rate 3, and user3 rate 5
- For movie1, user2 rate 1.5 and user3 rate 2.5 for movie1
- Then from scale 0.5 to 1.0, the prediction produce (1.5/3 = 0.5 for user1 and 2.5/5 = 0.5 for user2) which means x1 for romantic-type user, rated 0.5

- so here's how we plot the algorithm
- given all of the users rated, sum all of the movies that has been rated, and try to minimize the squared error projection between the prediction and original output
- let's also not forget the regularization that makes the jump not too huge
- not only that, we also want to learn all the features from all of the movies.
- So we want to take the summation of the minimized cost function above, and sum of all the movies.
- Hopefully we get our working algorithm for all features for all movies. no feature missing for each of the movie
- We want to minimize the feature of x(n) by minus the cost function(with additional regularization)

- so this is what it means by collaborative filtering.
- The algorithm tried to helping each other, between feature of movies, and users rating
- we may have our initial guess of the rating of the users. And based on users rating, we try to predict the feature of the movies, and use it to predict user rating, and so on.
- This way the algorithm will eventually converged to optimized features of all of the movies, and all rating of all of the users
- this is plausible if users rated multiple features and hopefully movies rated by multiple users

- next, how we improve the algorithm to work efficiently