I trained a multi-class classifier on images of cats, dogs and wild animals and passed an image of myself, it’s 98% confident I’m a dog. The problem isn’t that I passed an inappropriate image, because models in the real world are passed all sorts of garbage. It’s that the model is overconfident about an image far away from the training data. Instead we expect a more uniform distribution over the classes. The overconfidence makes it difficult to post-process model output (setting a threshold on predictions, etc.), which means it needs to be dealt with by the architecture.
In this post I explore a Bayesian method for dealing with overconfident predictions for inputs far away from training data in neural networks. The method is called last layer Laplace approximation (LLLA) and was proposed in this paper published in ICML 2020.
Why is this a problem?
You might argue “you only trained the classifier on animals, of course it breaks when you show it a human”, and you’re right. However, in the real world, we aren’t able to filter out animal images from non-animal images before sending it to the model, so we need it to be robust to garbage input. The animal-human example tries to replicate this on a small scale (one image). Properly quantifying uncertainty is important because we (as the practitioners training the models) can’t be confident in the model’s ability to generalize if it assigns arbitrarily high confidence to garbage input.
The 3-class classifier was trained on images of cats, dogs and wild animals taken from Kaggle that can be downloaded here.
To continue reading this article, click here.