Error Analysis

  |   Source
Error Analysis
  • Previous: problems for fitting the parameters
  • Discuss about how error analysis to make systematic decision

  • Choosing the most simplest way to be implemented quickly.
  • Never PREMATURE OPTIMISATION (famous advice in computer programming). Always build something simple.Let learning curve evidence direct us to what effort to do first.
  • From that, we can use the learning curve to decide if we add more complexity(parameters) or training set. In this way we can combat "gut feeling"
  • Alternatively, often people use error analysis. it is used to spot error pattern.
  • Even simple one still give vague answer about what to do decide whether it has high bias/ high variance


  • Here are the specific example
  • Observe the 100 emails errors(misclassify) from 500 training examples.
  • Based on that, we can do those two things mentioned.
  • From the first method(type of email) we know that for example our algorithm is mostly misclassify "steal passwords".
  • What features may be most helpful. When we cross-validate, we know that  we are vulnerable in detecting punctuation. We can build the algorithm to analyse that rather spend time on others.
  • Simple and quick dirty helps us to make a better understanding on which hard or easy examples


  • We also want to have some single row number(numerical evaluation) that indicates some kind of accuracy
  • For example this is NLP problem to detect similar words using stemming software
  • Based on numerical evaluation, we know that using stemming or without stemming would increase our performance or not.
  • Or distinguish upper vs lower would increase/decrease our performance
  • There's a lot of implementation method or different version of learning algorithm, and it's good to have numerical evaluation to test which has the lowest error.


  • Use CV to choose whether we need stemming or not.
  • lots of new version of learning algorithms keep getting implemented every time and use error analysis whether the performance of our learning algorithms is increasing or not
  • Most of the time the simplest method is the most correct one
  • The fast result will give us insight about which specific problem need specific actions to handle, and better method to handle / generalize it
  • Don't afraid about being too quick or to dirty implementation. Once we got our first initialisation, we can then have a grip about which direction we should go into.
  • So once again, take a quick and dirty way to implement the algorithms. From the result , we will then spend some time wisely go which the direction of complexity of the learning algorithm for particular set of problems