学习如何有限利用专家预测来推迟 (Learning to Defer with Limited Expert Predictions)

Recent research suggests that combining AI models with a human expert can exceed the performance of either alone. The combination of their capabilities is often realized by learning to defer algorithms that enable the AI to learn to decide whether to make a prediction for a particular instance or defer it to the human expert. However, to accurately learn which instances should be deferred to the human expert, a large number of expert predictions that accurately reflect the expert's capabilities are required -- in addition to the ground truth labels needed to train the AI. This requirement shared by many learning to defer algorithms hinders their adoption in scenarios where the responsible expert regularly changes or where acquiring a sufficient number of expert predictions is costly. In this paper, we propose a three-step approach to reduce the number of expert predictions required to train learning to defer algorithms. It encompasses (1) the training of an embedding model with ground truth labels to generate feature representations that serve as a basis for (2) the training of an expertise predictor model to approximate the expert's capabilities. (3) The expertise predictor generates artificial expert predictions for instances not yet labeled by the expert, which are required by the learning to defer algorithms. We evaluate our approach on two public datasets. One with "synthetically" generated human experts and another from the medical domain containing real-world radiologists' predictions. Our experiments show that the approach allows the training of various learning to defer algorithms with a minimal number of human expert predictions. Furthermore, we demonstrate that even a small number of expert predictions per class is sufficient for these algorithms to exceed the performance the AI and the human expert can achieve individually.

翻译：近期的研究表明，将AI模型与人类专家相结合可以超越单独应用任一方法的性能。使AI和人类专家的能力相结合的方法通常是通过训练算法来学习推迟并决定是否为特定实例进行预测，或将实例推迟交给人类专家进行预测。然而，为了准确学习应该将哪些实例推迟交给人类专家进行预测，需要大量准确反映专家能力的专家预测 —— 除了需要训练AI所需的基本数据标签外。许多学习推迟算法共有的这个要求阻碍了它们在专家通常会变动，或获取足够多专家预测成本较高的场景下应用。在这篇论文中，我们提出了一个三步方法，以减少训练学习推迟算法所需的专家预测数量。该方法包括（1）训练一个具有基本数据标签的嵌入模型来生成特征表示，并以此作为（2）训练一个专家能力预测模型的基础。（3）专家能力预测器为专家未标记的实例生成所需的人工专家预测，这是学习推迟算法所必需的。我们在两个公共数据集上评估了我们的方法。一个数据集包括“人为”生成的人类专家预测，另一个数据集来自医学领域，包含真实世界的放射科医生的预测。我们的实验表明，这种方法允许使用最少数量的人类专家预测来训练各种学习推迟算法。此外，我们证明即使每个类别只有少量的专家预测，这些算法也能超越单独应用AI和人类专家所能达到的性能。