Notes about Structuring ML projects – Coursera course (Part III)
If you didn’t have the chance to check the first part and the second part of my notes about the course Structuring Machine Learning projects, I encourage that you do that ASAP, but in any case you can check this third part of my notes.
.
Error Analysis
Carrying out error analysis
To carry out error analysis, you should find a set of mislabeled examples, either in your dev set, or in your development set. And look at the mislabeled examples for false positives and false negatives. And just count up the number of errors that fall into various different categories.
By counting up the fraction of examples that are mislabeled in different ways, often this will help you prioritize.
Take away
- Error analysis: manually examining mistakes that your algorithm is making, can give you insights into what to do next.
- Ceiling on performance: Upper bound on how much you could improve performance by working on one problem.
- How to carry out error analysis: via the evaluation of multiple ideas in parallel in order to improve our algorithm, using a table. The conclusion of this process gives you an estimate of how worthwhile it might be to work on each of these different categories of errors.
.
Cleaning up: Incorrectly labeled data
With incorrectly labeled examples, DL algorithms are quite robust to random errors in the training set. But DL algorithms are less robust with systematic errors.
Now, this discussion has focused on what to do about incorrectly labeled examples in your training set. How about incorrectly labeled examples in your dev set or test set? If you’re worried about the impact of incorrectly labeled examples on your dev set or test set, what they recommend you do is during error analysis to add one extra column so that you can also count up the number of examples where the label Y was incorrect.
Image | Dog | Great Cat | Blurry | Incorrectly Labeled | Comments |
---|---|---|---|---|---|
... | |||||
98 | labeler missed cat in background | ||||
99 | |||||
100 | Drawing of a cat. Not a real cat. | ||||
% of total | 8% | 43% | 61% | 6% |
Example A: is it worthwhile going in to try to fix up this 6% of incorrectly labeled examples?
Overall dev set error | 10% |
Errors due incorrect labels | 0.6% |
Errors due to other causes | 9.4% |
Our system has 90% overall accuracy and 10% error. In this case, 6% of the errors are due to incorrect labels. So 6% of 10% is 0.6%. And then you should look at errors due to all other causes. So if you made 10% error on your dev set and 0.6% of those are because the labels is wrong, then the remainder, 9.4% of them, are due to other causes such as misrecognizing dogs being cats, great cats and their images.
So in this case, go in and fix these incorrect labels it’s maybe not the most important thing to do right now.
.
Example B: is it worthwhile going in to try to fix up this 2% of incorrectly labeled examples?
Overall dev set error | 2% |
Errors due incorrect labels | 0.6% |
Errors due to other causes | 1.4% |
Let’s say the errors down to 2%, but still 0.6% of your overall errors are due to incorrect labels. And a very large fraction of them, 0.6 divided by 2%, so that is actually 30% rather than 6% of your labels. Your incorrect examples are actually due to incorrectly label examples.
When such a high fraction of your mistakes as measured on your dev set due to incorrect labels, then it maybe seems much more worthwhile to fix up the incorrect labels in your dev set.
Conclusions
First, deep learning researchers sometimes like to say things like, “I just fed the data to the algorithm. I trained in and it worked.” There is a lot of truth to that in the deep learning error. But checking manually the examples can really help you prioritize where to go next.
Take away
- Checking manually the examples can really help you prioritize where to go next, so it is really valuable make a error analysis, and after that set a path based on your numbers.
- It’s really important that your dev and test sets come from the same distribution. But if your training set comes from a slightly different distribution, often that’s a pretty reasonable thing to do.
.
Build you first system quickly, then iterate
If you’re working on a brand new machine learning application, one of the piece of advice I often give people is that, I think you should build your first system quickly and then iterate.
Depending on the area of application, the guideline below will help you prioritize when you build your system.
- Set up development/ test set and metrics –
- Set up a target
- Train training set quickly: Fit the parametersBuild an initial system quickly
- Development set: Tune the parameters
- Test set: Assess the performance
- Use Bias/Variance analysis & Error analysis to prioritize next steps
Build your first system quickly, then iterate applies less strongly if you’re working on an application area in which you have significant prior experience. It also implies to build less strongly if there’s a significant body of academic literature that you can draw on for pretty much the exact same problem you’re building.
Take away
- Starting on building a brand new machine learning application, is to build your first system quickly and then iterate.
- Set up a dev/test set and metric: this is really deciding where to place your target. And if you get it wrong, you can always move it later, but just set up a target.