-simpler cost function
-apply gradient descent to logistic regression
-fully ready logistic regression
this is more simpler and compact way for calculating cost function
taking advantage of y equals either 0 or 1,
making one line of cost function and disabling y each other..
one line of function is essentially used for linear regression
what's left is how to minimize J(theta)
the gradient descent function as shown above has include the derrivative function for cost of theta
almost the same as linear regression, except for calculating the hypothesis which stated above...
same technique also applied for monitoring gradient descent converge to global optima as linear regression...
(plot the gradient descent function for every iteration and see if gradient descent decreasing every iteration)
this is vectorized implementation for logistic regression...
there is feature scalling that makes faster gradient descent
this is the most widely used algorithm for classification...