# Kernels II

|   Source
Kernels II
• Kernels video to define new features
• This video discussed about what missing in kernels and bias/variance trade off
• for every training example that we have, put landmark on same location.
• That way we know similarity between landmarks and training example. So we're going to have landmarks as much as training examples.
• This is based on training example on features(let's say x1/x2). The color of graph shown on the right is not matter much.

• This section is where we putting together SVM with kernels.
• So first of all we're going to make landmarks with m-unit, and assign each landmark with training example x.
• Then for every training example x (can be from training/cross-validation/test set) compute f vector
• f vector is where we compute between all similarity (from f1 to fm) and x, gather all the similarity value to x and wrap it into a vector.
• In other words, by example x(i) would have fi m-dimensional vector, with elements compute from f1 to fm.
• Among all f value, there's should be fi which exactly similar to xi.
• And then, instead of including xi, we're going to include fi, where fi is f m-dimensional vector that each elements calculate similarity to xi

• Instead of usual hypothesis that include xi, we're going to replace it with fi.
• So the prediction then change that compute theta transpose multiply with f.
• As we look into the formula each theta will be multiplied by f.
• As f unit is match to m training examples, then the number of theta should be in equal number of training examples
• n = number of features, eventually equals to m (same number of units as training examples).
• How to get parameters (theta) value? We're doing this with SVM training cost function that discussed earlier, in goal is to minimize theta.
• As we can see in the SVM formula, we already replace xi with fi.
• Now, the regularization term is slightly different. We already discussed about n being the same number of m, keep in mind that theta0 is still unregularized.
• For SVM problem, computing f-m is computationally expensive, we're basically have parameters that have m unit.

• So we're going to slightly modify regularization term to be coherent with SVM and make it efficient calculation.
• Instead of having usual regularization term, we're going to multiply thetaT * theta, exclude theta0
• "M" is rescaled version, from kernel SVM, that try to minimize the value of theta
• M is mathematical detail, reason for mathematical efficiency
• Kernels, can be applied to logistic regression using kernels, but can be computationally expensive and can be really slow
• SVM with its character can be going well with kernel, and with other advanced optimization techniques specifically for SVM can be really efficient
• Use external packages, don't write own code, because has tested to be compute really fast. SVM has built-in in common library

• This is a topic of worth mentioning in SVM kernel.
• Choose wisely of C value, as we learn the higher the C value, more overfit, where lower C value, more underfit.
• Then we also know sigma squared of the Gaussian kernel, where fi tends to vary smoothly, long range because bigger sigma, and rapid fall, narrower with smaller sigma.
• In summary these are all SVM in kernels algorithm, and how the behaviour