Optimization Objective

  |   Source
Optimization Objective
  • Supervised learning : no matter what algorithm, more importance is get lot of data, and choosing wisely which features to be incoporated, regularization etc.
  • One of most powerful SL; SVM: compared to logistic, neural networks, more powerful  non-linear function.
  • Use in many industries. Last supervised learning algorithms that spent lot amount of time by Andrew Ng

  • SVM : modified logistic regression
  • The hypothesis hopes to approach the actual output; the value of z must be much larger than zero. By the value much much larger than zero, then the hypothesis will be approaching 1. And works the other way around
  • Remember that these alternative view is trying to nullify each other depends on whether y  == 1 or y == 0
  • The formula above, is when we are using 1 training example. Notice the z value of 1, that is a joint point between slanted line and flat line.
  • In logistic regression, with y =1, as the cost function approach zero(approach actual output), the z becomes much larger than zero.
  • And on the bottom right, with y = 0 as the cost function approach zero(approach actual output), the z becomes much smaller than zero.
  • In using SVM, we modified the logistric regression to be linearity. Using two line, slate going down to right, and flat right. By doing these, we are giving SVM a big advantage and much easier to implement.
  • With gaining 2 linear formula denotes by cost1(z) and cost0(z), we are preparing to implement SVM

  • For SVM, get rid of 1/m, this is the other convention made for SVM. that is if we get rid m, that the features won't likely to change.See the examples in red for more intuition
  • For the other convention, we simplify the equation with denotes the cost function with A, and regularization term with B.
  • Now we''re modifying, the other convention to be just like the simple term on the right.
  • lambda nullified, instead we are using c term.
  • Is not that c = 1/lambda. But it posses equal result to lambda.
  • The c value is really small so that B can have higher weight than cA. We don't weight B anymore, but instead makes A much more lighter. This way, we still trying to minimize the cost function.

  • These are mathematical definition in SVM
  • Differ from logistic regression, SVM doesn't count probability anymore. Instead we are using either 1 or 0 (not range from 0 to 1 like in logistic) for condition stated above
  • Next, what hypothesis SVM result, dig more about Optimization Objective, and add a little bit to manage more complex non-linear function.