Multiple Instance Learning (MIL) is a weakly supervised learning paradigm that is becoming increasingly popular because it requires less labeling effort than fully supervised methods. This is especially interesting for areas where the creation of large annotated datasets remains challenging, as in medicine. Although recent deep learning MIL approaches have obtained state-of-the-art results, they are fully deterministic and do not provide uncertainty estimations for the predictions. In this work, we introduce the Attention Gaussian Process (AGP) model, a novel probabilistic attention mechanism based on Gaussian Processes for deep MIL. AGP provides accurate bag-level predictions as well as instance-level explainability, and can be trained end-to-end. Moreover, its probabilistic nature guarantees robustness to overfitting on small datasets and uncertainty estimations for the predictions. The latter is especially important in medical applications, where decisions have a direct impact on the patient's health. The proposed model is validated experimentally as follows. First, its behavior is illustrated in two synthetic MIL experiments based on the well-known MNIST and CIFAR-10 datasets, respectively. Then, it is evaluated in three different real-world cancer detection experiments. AGP outperforms state-of-the-art MIL approaches, including deterministic deep learning ones. It shows a strong performance even on a small dataset with less than 100 labels and generalizes better than competing methods on an external test set. Moreover, we experimentally show that predictive uncertainty correlates with the risk of wrong predictions, and therefore it is a good indicator of reliability in practice. Our code is publicly available.
翻译:多事例学习(MIL)是一个监管不力的学习模式,越来越受人欢迎,因为它比完全监督的方法要求的标签努力更少。这对于创造大量附加说明的数据集仍然具有挑战性的领域,如医学领域,尤其有趣。尽管最近深层次学习的MIL方法获得了最新的结果,但它们完全具有确定性,不能为预测提供不确定性估计。在这项工作中,我们引入了关注高西亚进程(AGP)模型,这是一个基于高萨进程的新颖的预测性关注机制,用于深度MIL。AGP提供了准确的包级预测以及实例级的可解释性,并且可以经过培训。此外,其概率性能保证了对小型数据集和预测的不确定性估计。在医学应用中,后者决定直接影响到病人的健康。因此,拟议的模型是实验性的。首先,它的行为在两个合成的MILL实验中得到了精确的精确性预测性预测性以及实例级级级的准确性,在众所周知的MNIST和CIFAR10实验中,一个不同的测试性指标显示,在现实的测试中,一个不同的是,一个不同的是,一个是,一个是常规的,一个是常规的,一个是常规的,一个是常规的,一个测试性的数据级的,然后,一个是,一个是,一个是不同的,一个在现实的,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,一个是,