Notes of Diagnosing Learning Algorithm

Datasets

\(\displaystyle J_{test}(\theta) = \frac{1}{2m_{test}} \sum_{i=1}^{m_{test}} \left( h_\theta(x_{test}^{(i)}) - y_{test}^{(i)} \right) ^ 2\)

High bias (underfit): \(J_{train}(\theta)\) will be high and \(J_{CV}(\theta) \approx J_{train}(\theta)\)
High variance (overfit): \(J_{train}(\theta)\) will be low and \(J_{CV}(\theta) \gg J_{train}(\theta)\)

plot(1:m, \(J_{train}(\theta)\), 1:m, \(J_{CV}(\theta)\))

Experiencing high bias:
Low training set size: \(J_{train}(\theta)\) will be low and \(J_{CV}(\theta)\) will be high
Large training set size: both \(J_{train}(\theta)\) and \(J_{CV}(\theta)\) will be high with \(J_{train}(\theta) \approx J_{CV}(\theta)\)
Experiencing high variance:
Low training set size: \(J_{train}(\theta)\) will be low and \(J_{CV}(\theta)\) will be high
Large training set size: \(J_{train}(\theta)\) increases with training set size and \(J_{CV}(\theta)\) continues to decrease without leveling off, also \(J_{train}(\theta) < J_{CV}(\theta)\) but the difference between them remains significant

Start with a simple algorithm, implement it quickly, and test it early on your cross validation data
Plot learning curves to decide if more data, more features, etc. are likely to help
Manually examine errors on examples in cross validation set and try to spot a trend where most of errors were made

	actual 1	actual 0
predicted 1	true positive	false positive
predicted 0	false negative	true negative

\(\text{accuracy} = \dfrac{tp + tn}{tp + tn + fp + fn}\)

\(\text{precision} = \dfrac{tp}{tp + fp}\)

\(\text{recall} = \dfrac{tp}{tp + fn}\)

\(F_1 = \dfrac{2 \cdot precision \cdot recall}{precision + recall}\)