The reason I have picked the value is because I thought it would fit the model.
The machine's overall confusion over time
Discrepancy between taught and overall data distributions over time