-
Spam Classification Example
-
Seem a little more complicated because touch high abstraction difference in the field machine learning
-
more complex problems for machine learning, math formula in this video tends to help much
-
this lesson is about how we prioritising in the machine learning problem
-
How do we use supervised learning to classify spam or not-spam?
-
Non spam on the left denotes by using number in words
-
Features:Choose words related to purchasing to spam, and subject name to non-spam.
-
This is spam/non-spam classification with logistic regression method.
-
We create 100 words feature vector.
-
List whether or not the each word is in the list are appear on the examples. If it does, mark 1, otherwise 0
-
Rather than manually choosing list of words, we can instead choosing the most frequent words (10K-50K) by training examples
-
Spend time to have high accuracy
-
Honeypot:Â create lots of fake email, and let it be spammed. So we can have those fake emails as our training examples.
-
Gather data by email header, which the content has routing information. They sometimes take an unusual routes of source.
-
in Email body,misspellings in spam is often intended to avoid spam filtering words.
-
These are options on what to do to in spam classifier examples.
-
Often many machine learning scientist spend some time and fixated about these options. Sometimes this doesn't fruitful at all.
-
What's not recommended is when people use "gut feeling" like the solution they feel right when they woke up in the morning.
-
Next error analysis and how we spend time choosing the right way to increase the learning performance