This video will show



  1. Good Stochastic
    • In this case the up and down(stochastic) shown and see a point where the actual minimum of cost function
    • In red line, when we choose smaller alpha, we see that it actually show better the minimum cost function
  2. Second
    • The second shown in blue line with 1000 examples, still stochastic
    • When using 5000 examples we actually see smoother line as shown in red, but the drawback is we get delayed feedback of learning rate
  3. Third
    • Here’s we actually shown that the algorithm may not learning at all
    • The red shown as we see that the trend may actually converging down to minimum, we just didn’t see it because it’s too stochastic with blue(1000 examples)
    • Of course it may also be that we have a flat line, this case try to use more features
  4. Fourth
The algorithm is diverging, vastly overshoot. In this case we try to use smaller alpha



some scientist just using the alpha constant, where some other scientist set the alpha to keep getting smaller.