Recent work has shown that models trained to the same objective, and which achieve similar measures of accuracy on consistent test data, may nonetheless behave very differently on individual predictions. This inconsistency is undesirable in high-stakes contexts, such as medical diagnosis and finance. We show that this inconsistent behavior extends beyond predictions to feature attributions, which may likewise have negative implications for the intelligibility of a model, and one's ability to find recourse for subjects. We then introduce selective ensembles to mitigate such inconsistencies by applying hypothesis testing to the predictions of a set of models trained using randomly-selected starting conditions; importantly, selective ensembles can abstain in cases where a consistent outcome cannot be achieved up to a specified confidence level. We prove that that prediction disagreement between selective ensembles is bounded, and empirically demonstrate that selective ensembles achieve consistent predictions and feature attributions while maintaining low abstention rates. On several benchmark datasets, selective ensembles reach zero inconsistently predicted points, with abstention rates as low 1.5%.
翻译:最近的工作表明,为同一目标而培训的模型,如果在一致测试数据方面达到类似的精确度,在个别预测方面也可能表现得非常不同。这种不一致在医学诊断和资金等高风险情况下是不可取的。我们表明,这种不一致的行为超越了预测,而扩大了特性归属,这同样可能对模型的智能和对对象的追索能力产生消极影响。我们随后采用选择性组合,通过对利用随机选择的起始条件对一组经过培训的模型的预测进行假设测试,来减少这种不一致;重要的是,在无法取得一致结果达到特定信任水平的情况下,选择性组合可以弃权。我们证明,对选择性组合之间的分歧的预测是相互交错的,并用经验证明,选择性组合在保持低弃权率的同时实现一致的预测和特性归属。在几个基准数据集中,选择性组合达到零个不一致的预测点,弃权率低为1.5%。