Notes of Diagnosing Learning Algorithm
Datasets
- Training set: 60%
- Cross validation set: 20%
- Test set: 20%
\(\displaystyle J_{test}(\theta) = \frac{1}{2m_{test}} \sum_{i=1}^{m_{test}} \left( h_\theta(x_{test}^{(i)}) - y_{test}^{(i)} \right) ^ 2\)
Bias and Variance
- High bias (underfit): \(J_{train}(\theta)\) will be high and \(J_{CV}(\theta) \approx J_{train}(\theta)\)
- High variance (overfit): \(J_{train}(\theta)\) will be low and \(J_{CV}(\theta) \gg J_{train}(\theta)\)
Learning Curves
plot(1:m, \(J_{train}(\theta)\), 1:m, \(J_{CV}(\theta)\))
- Experiencing high bias:
- Low training set size: \(J_{train}(\theta)\) will be low and \(J_{CV}(\theta)\) will be high
- Large training set size: both \(J_{train}(\theta)\) and \(J_{CV}(\theta)\) will be high with \(J_{train}(\theta) \approx J_{CV}(\theta)\)
- Experiencing high variance:
- Low training set size: \(J_{train}(\theta)\) will be low and \(J_{CV}(\theta)\) will be high
- Large training set size: \(J_{train}(\theta)\) increases with training set size and \(J_{CV}(\theta)\) continues to decrease without leveling off, also \(J_{train}(\theta) < J_{CV}(\theta)\) but the difference between them remains significant
Deciding What to Do Next
- Getting more training examples: Fixes high variance
- Trying smaller sets of features: Fixes high variance
- Adding features: Fixes high bias
- Adding polynomial features: Fixes high bias
- Decreasing \(\lambda\): Fixes high bias
- Increasing \(\lambda\): Fixes high variance
Error Analysis
- Start with a simple algorithm, implement it quickly, and test it early on your cross validation data
- Plot learning curves to decide if more data, more features, etc. are likely to help
- Manually examine errors on examples in cross validation set and try to spot a trend where most of errors were made
Precision and Recall
actual 1 | actual 0 | |
---|---|---|
predicted 1 | true positive | false positive |
predicted 0 | false negative | true negative |
\(\text{accuracy} = \dfrac{tp + tn}{tp + tn + fp + fn}\)
\(\text{precision} = \dfrac{tp}{tp + fp}\)
\(\text{recall} = \dfrac{tp}{tp + fn}\)
\(F_1 = \dfrac{2 \cdot precision \cdot recall}{precision + recall}\)