Notes about Structuring Machine Learning Projects by Andrew Ng (Part II)

I am following the course “Structuring Machine learning projects” in Coursera, and I am sharing a brief summary, this is the initial summary about the first part of the course, and his is the second part.

.

General considerations:

  • This second part of the week 1 is more complex than the initial set of talks, specially the second talk “Avoidable bias” and the third talk “Understanding human-level performance”. I would recommend to check both a couple of times in order to be familiar with the ideas and concepts explained on them. 
  • The examples developed in each of the talks have the double purpose of introduce concepts and show how to evaluate correctly each situation. In many cases they require a second revision to fully understand the concepts presented.

.

Comparing to human-level performance

1. Why human-level performance?

Why we are comparing machine learning systems to human level performance? 

  • Because ML algorithms are suddenly working much better
  • Designing and building the workflow of a ML system is much more efficient when humans can also do the same. 

Progress tends to be relatively rapid as you approach human level performance. When the algorithm surpasses human-level performance the progress and accuracy actually slows down. 

And over time, as you keep training the algorithm the performance approaches but never surpasses some theoretical limit called the Bayes optimal error. 

Why is so difficult to improve the performance once the algorithm surpasses the human-level performance? 

  • Human level performance is close to Bayesian optimal error so there’s not that much head room to still improve.
  • Tools  to improve performance are harder to use:
    • Get labeled data from humans
    • Gain insight from manual error analysis: Why did a person get this right? 
    • Better analysis of bias/variance. 

.

Take away

  • ML algorithms can compete with human-level performance since they are more productive and more feasible in a lot of application. 
  • Building an algorithm: the progress tends to be relatively rapid as you approach human level performance. When the algorithm surpasses human-level performance the progress and accuracy actually slows down. 
  • Bayes optimal error: the best possible error.

.

2. Avoidable bias

By knowing what the human-level performance is, it is possible to tell when a training set is performing well or not.

Example: Cat vs. Non-cat

Scenario AScenario B
Human1% error7.5% error
Training error8% error8% error
Development error10% error10% error

Scenario A

There is a 7% gap between the performance of the training set and the human level error. It means that the algorithm isn’t fitting well with the training set since the target is around 1%. To resolve the issue, we use bias reduction technique such as training a bigger neural network or running the training set longer.

Scenario B

The training set is doing good since there is only a 0.5% difference with the human level error. The difference between the training set and the human level error is called avoidable bias. The focus here is to reduce the variance since the difference between the training error and the development error is 2%.To resolve the issue, we use variance reduction technique such as regularization or have a bigger training set.

.

Take away

  • By definition, human level error is worse than Bayes error because nothing could be better than Bayes error but human level error might not be too far from Bayes error.
  • Avoidable bias: The difference between the training set error and the human level error.
  • Bias reduction techniques to reduce the variance between the training set and the human level error: training a bigger neural network or running the training set longer.
  • Variance reduction techniques to reduce the variance between the training error and the development error: regularization or have a bigger training set.

.

3. Understanding human-level performance

The definition of  human-level performance is useful for helping us to drive progress in our machine learning project.

Example 1: Medical image classification

This is an example of a medical image classification in which the input is a radiology image and the output is a diagnosis classification decision.

In this case, how should you define human-level error? Is human-level error 3%, 1%, 0.7% or 0.5%?

Classification error(%)
Typical human
3.0
Typical doctor1.0
Experienced doctor0.7
Team of experienced doctor0.5

The definition of human-level error depends on the purpose of the analysis, in this case, by definition the Bayes error is lower or equal to 0.5%, because by definition the optimal error (Bayes) cannot be higher than 0.5%.

For the purpose of publishing a research paper or deploying a system in some particular context, perhaps you can use a different definition of human-level error as long as you surpass the performance of a typical doctor.

.

Example 2: Error analysis

Scenario A Scenario BScenario C
Human (proxy for Bayes error)1 1
0.70.70.5
0.50.5
Training error510.7
Development error650.8

Scenario A

In this case, the choice of human-level performance doesn’t have an impact. The avoidable bias (gap between Bayes error and training error) is between 4%-4.5% and the variance is 1%. Therefore, the focus should be on bias reduction technique.

.

Scenario B

In this case, the choice of human-level performance doesn’t have an impact. The avoidable bias is between0%-0.5% and the variance is 4%. Therefore, the focus should be on variance reduction technique.

.

Scenario C

In this case, the estimate for Bayes error has to be 0.5% since you can’t go lower than the human-level performance otherwise the training set is overfitting. Also, the avoidable bias is 0.2% and the variance is 0.1%. Therefore, the focus should be on bias reduction technique.

.

Take away

  • This video es about understand human level performance and understand how it works in real projects.
  • Making progress in a machine learning problem gets harder as you achieve or as you approach human-level performance.
  • In real-life projects, there is no particular expectation to get 0% error. Because sometimes Bayes error is non zero and sometimes it’s just not possible for anything to do better than a certain threshold of error.
  • Having an estimate of human-level performance gives you an estimate of Bayes error. And this allows you to more quickly make decisions as to whether you should focus on trying to reduce a bias or trying to reduce the variance of your algorithm.

.

4. Surpassing human-level performance

Example 1: Classification task

Scenario A (error %)Scenario B (error %)
Team of humans0.50.5
One human 1.01
Training error0.60.3
Development error0.80.4

.

Scenario A

In this case, the Bayes error is 0.5%, therefore the available bias is 0.1% et the variance is 0.2%.

.

Scenario B

In this case, there is not enough information to know if bias reduction or variance reduction has to be done on the algorithm. It doesn’t mean that the model cannot be improve, it means that the convention always to know if bias reduction or variance reduction are not working in this case.

.

Scenarios where machine learning is better that human performance

There are problems where ML significantly surpasses human-level performance:  online advertising,  product recommendations,  logistics (predicting transit time) and loan approvals.

All these examples are actually learning from structured data, where you might have a database of what has users clicked on, database of proper support for, databases of how long it takes to get from A to B, database of previous loan applications and their outcomes. And these are not natural perception problems.

Humans tend to be very good in natural perception task. So it is possible, but it’s just a bit harder for computers to surpass human-level performance on natural perception task.

All of these are problems where there are teams that have access to huge amounts of data. Also, for that reason is relatively easy for a computer to surpass human-level performance.

.

Take away

  • Problems where ML significantly surpasses human-level performance:  online advertising,  product recommendations,  logistics (predicting transit time) and loan approvals.

.

5. Improving your model performance

There are 2 fundamental assumptions of supervised learning:

  • You can fit the training set pretty well.
  • The training set performance generalizes pretty well to the dev/test set.

If the difference between human-level error and the training error is bigger than the difference between the training error and the development error, the focus should be on bias reduction technique which are training a bigger model, training longer or change the neural networks architecture or try various hyper parameters search.

If the difference between training error and the development error is bigger than the difference between the human-level error and the training error, the focus should be on variance reduction technique which are bigger data set, regularization or change the neural networks architecture or try various hyper parameters search.

Take away

  • The machine learning strategy is how to choose the right direction of the most promising things to try.

.

Conclusions

Congratulations! If you read the first part of the summary and this second part, you have already had an idea about all the videos that compose the first week of the course “Structuring Machine Learning projects”.

After these videos, in the course there is a short evaluation with 15 questions. It is quite simple, no tricky questions, just a good review about the ideas presented.