This paper addresses and solves some challenges in the adoption of machine learning in insurance with the democratization of model deployment. The first challenge is reducing the labelling effort (hence focusing on the data quality) with the help of active learning, a feedback loop between the model inference and an oracle: as in insurance the unlabeled data is usually abundant, active learning can become a significant asset in reducing the labelling cost. For that purpose, this paper sketches out various classical active learning methodologies before studying their empirical impact on both synthetic and real datasets. Another key challenge in insurance is the fairness issue in model inferences. We will introduce and integrate a post-processing fairness for multi-class tasks in this active learning framework to solve these two issues. Finally numerical experiments on unfair datasets highlight that the proposed setup presents a good compromise between model precision and fairness.
翻译:本文讨论并解决了在采用模型部署民主化的保险中机械学习方面的一些挑战。第一个挑战是在积极学习的帮助下,减少标签工作(即注重数据质量),在模型推理和甲骨文之间形成反馈循环:在保险中,未贴标签数据通常很丰富,积极学习可以成为降低标签成本的重要资产。为此,本文件在研究对合成和真实数据集的经验影响之前,勾画了各种传统的主动学习方法。保险的另一项关键挑战是模型推理中的公平问题。我们将在这个积极的学习框架中引入并整合多级任务后处理的公平性,以解决这两个问题。最后,关于不公平数据集的数字实验强调,拟议的设置在模型精确性和公平性之间提供了良好的折中。