Non-linear hypothesis
- Outdated but most powerful learning algorithm in most ML problems
Regularized Logistic Regression
- regularized both gradient descent of cost function and the more advanced optimization that includes cost function and derrivative
Regularized Linear Regression
- regularized both gradient descent and normal equation algorithms for linear regression
The problem of overfitting
The problem of overfitting algorithm
Regularization : a way to decrease overfitting problem
Regularization : a way to decrease overfitting problem
high bias(underfit) : misinterpreted line fit for the data
high variance(overfit) : lot of features (many high order of polynomials) but lack more data to give a good hypothesis
example for logistic regression
there is a tool for analyzing whether the algorithm has overfitting or underfitting...
a lot of features may risk a lot of high order polynomials....
making it even harder to visualize (in case of over 100 features)
first option, either manually decrease the features or automatically reduce number of features that will be discussed later in greater depth..
The disadvantage is we don't know whether the features particularly useful for latter, or even all features matter...
multiclass classification
-the one that called one-vs-all classification algorithm
classification with more than two groups
classification with more than two groups
either starting from index 0 or 1 doesn't matter..
the defined algorithms earlier is to compute the binary by using logistic regression...
using one vs all algorithm for solving multiclass classification.
- the algorithm is essentially make two groups, with each class again the rest...and take each hypothesis(that denotes by superscript) for each class
Advanced Optimization
-optimize for faster gradient descent
-scalling efficiently for lots of features
-scalling efficiently for lots of features
the new algorithm will optimize the cost function and it's derrivative terms
after we write code at right... we now be able to write optimization code
given options that stared,
optimset = set of options for optimization
exit flag is there to makes sure the cost function is converged...
initialTheta has to be 2 element vector minimal
we need the code for further optimize the cost function...
Simplified cost function and gradient descent
-simpler cost function
-apply gradient descent to logistic regression
-fully ready logistic regression
this is more simpler and compact way for calculating cost function
-apply gradient descent to logistic regression
-fully ready logistic regression
taking advantage of y equals either 0 or 1,
making one line of cost function and disabling y each other..
one line of function is essentially used for linear regression
what's left is how to minimize J(theta)
the gradient descent function as shown above has include the derrivative function for cost of theta
(plot the gradient descent function for every iteration and see if gradient descent decreasing every iteration)
Decision Boundary
-intuition for hypothesis for logistic regression.
approach(asynthosing) at 1
approach(asynthosing) at 1
so as a classification between 0 and 1,
we get hypothesis >= 0.5 for y = 1
and hypothesis < 0.5 for y = 0
the region (purple and blue area) is set not by training set, but from the value set by theta (parameters) . in regards, the theta are -3, 1,1 respectively, that are predefined earlier....
the first graph determine a non linear regression with decision boundaries y =1 inside a purple circle... and everything else ( outside the circle) y = 0...
again the decision boundaries is determined by value set by theta..
the training set may produce the value of parameters, but the the parameters is the vital part for creating the decision boundaries...
the second graph is more complex parameters with higher order polynomial...
as we can see the boundaries is more complicated than usual...
these visualization graphs would give us better understanding about various range of representation...
next, we talk about how automatically choose value of parameters based on training set...s
the training set may produce the value of parameters, but the the parameters is the vital part for creating the decision boundaries...
as we can see the boundaries is more complicated than usual...
Classification
linier regression may disperse value set for classification
hypothesis representation
the target is to build :
the hypothesis must be between zero and one.
the same as z negative towards infinity, g(z) approach 0
default y = 1
so if h0(x) = 0.7 then y = 1;
70% of being malignant
the default for x0 is as usual, 1
and x1 is a feature of tumorsize