-optimize for faster gradient descent
-scalling efficiently for lots of features


the new algorithm will optimize the cost function and it's derrivative terms


the 3 other optimization algorithms are very complex ND shouldn't be used unless you are a professional in numerical computing....

for every language,  choose the best library by testing it for particular problem that we have...  write our own code  is not really recommended


after we write code at right...  we now be able to write optimization code
given options that stared,

octave function fminunc will take pointer,  name of function,  opt theta,  that automatically chooses learning rate...
optimset = set of options for optimization

here's how to implement in octave,  after we defined cost function

exit flag is there to makes sure the cost function is converged...
initialTheta has to be 2 element vector minimal

this is optimization for simple quadratic function...


remember that octave indexes starting from 1
we need the code for further optimize the cost function...


these are advanced optimize that can do better than gradient descent..