As machine learning models are increasingly deployed in high-stakes domains such as legal and financial decision-making, there has been growing interest in post-hoc methods for generating counterfactual explanations. Such explanations provide individuals adversely impacted by predicted outcomes (e.g., an applicant denied a loan) with recourse -- i.e., a description of how they can change their features to obtain a positive outcome. We propose a novel algorithm that leverages adversarial training and PAC confidence sets to learn models that theoretically guarantee recourse to affected individuals with high probability. We demonstrate the efficacy of our approach with extensive experiments on real data.
翻译:随着机器学习模式越来越多地被运用于法律和财政决策等高风险领域,人们越来越关注采取事后方法来提出反事实解释,这种解释为受到预测结果(例如,申请人被拒绝贷款)不利影响的个人提供了追索手段,即说明他们如何改变自己的特征以获得积极结果。我们提出了一种新奇的算法,利用对抗性培训和PAC信心组来学习理论上保证向受影响个人求助的模型。我们通过对真实数据的广泛实验,展示了我们做法的效力。