Error metrics for Skewed Classes

  |   Source
Error metrics for Skewed Classes

Previous: error analysis, single row number of error metrics to tell how well its doing

  • Skewed classes: a somewhat trickier problem to handle with error analysis.

  • This is cancer examples, where we have 1% error test. Suppose to be good learning algorithm?
  • But as it turns out, the cancer patients only 0.50%. If we set the function that ignore X, only set y = 0 all the time(set all patients don't have cancer), then automatically we have 0.5% error test. EVEN BETTER!(sarcastically)
  • If this the problem, then it is called Skewed classes. And become much harder problem.
  • Skewed classes: The data that we have turns out not balance, it weight more to one class than the other
  • Which turns out ignore the data is more correct than incorporating data.
  • The solution? Even improving the accuracy of the algorithms, it still lack the prediction of real overall output
  • Better come up with different metrics
  • One of different  error metrics
  • Performs 2x2 table an observed whether the predicted match the actual or not
  • Precision : the ratio of patients actually have cancer based on all cancer prediction (actual positive/predicted positive)
  • Recall: the ratio of correctly tell them if they are indeed have a cancer. The higher the recall, the better our learning algorithm.
  • Using these, in skew classes, there's no possible to cheat ( 0 or 1 all the time). For example if we set y = 0(all patients don't have cancer) all the time, then we would have recall = 0. That is we can't predict at all whether or not the patients have cancer.
  • With Precision/Recall , we can tell how's the algorithm is doing well even if we have skew classes. Good error metrics for evaluation classifier for skewed classes rather than just classification error/accuracy.