Ceiling Analysis: What Part of the Pipeline to Work on Next

2018-04-05 00:00 | Source

So we have arrived in the conclusion that time is really important for the developers or ML to spent on.
If spent time incorrectly, we may waste a lot of time and just get a little performance increase.
This video will talk about ceiling analysis, and how we using it to prioritise our pipeline to

In the case of ceiling analysis, we want to separate pipelines of the problem, and detecting what benefit the most from our effort.
On the bottom right, we have accuracy of overall system.
Suppose we have run our algorithm and have the accuracy of predicting character by 72%.
Then , try manually set the labeled of the first step, text detection.
By manually (don't run learning algorithm on text detection) set label text detection perfectly, we have passed text detection as 100% accuracy(just in text detection). Then we run the algorithm to the end and have performance increase to 89%.
Then we moving to ceiling step 2, character segmentation. We manual perfectly set labeled for text detection, and character segmentation, then run the algorithm again. Increase to 90%.
Then finally set labeled perfect for all pipeline give us 100% increase.
If we look into performance increase, there's 17% increase, 1% increase and 10% increase. That means text detection should get our effort much, don't waste on character segmentation (perfect only 1% increase), and then maybe character recognition.
Let's move on to the next example.

This is the face recognition example that step-through to simplified process in order to better understand the ceiling analysis.
FIrst we remove background. And then we do face detection.
Divide 3 segmentation, eyes would be the most important. Gather all segmentations that feed to logistic regression producing the label of a person, the name.
Probably more complicated, but just for illustration of the process.

Again we ceiling-analysis through the process.
Then we have three problem that we better spent our effort into (pointed by magenta)
When we set perfect the algorithm for removing background, our performance just increase 0.1%
There's once a team of two engineer working 18 months to perfectly set background removal.
They published the papers and conclude that their algorithm didn't increase the performance.
If only the one of them perform ceiling analysis, then they would not waste that effort.

So ceiling-analysis would give us some insight on how we increase our performance.
Andrew Ng's experience over the years in ML learn not trust gut-feeling and rely based on ceiling-analysis as it would give definite prioritize about what we should work on.