Understanding model performance on unlabeled data is a fundamental challenge of developing, deploying, and maintaining AI systems. Model performance is typically evaluated using test sets or periodic manual quality assessments, both of which require laborious manual data labeling. Automated performance prediction techniques aim to mitigate this burden, but potential inaccuracy and a lack of trust in their predictions has prevented their widespread adoption. We address this core problem of performance prediction uncertainty with a method to compute prediction intervals for model performance. Our methodology uses transfer learning to train an uncertainty model to estimate the uncertainty of model performance predictions. We evaluate our approach across a wide range of drift conditions and show substantial improvement over competitive baselines. We believe this result makes prediction intervals, and performance prediction in general, significantly more practical for real-world use.
翻译:了解无标签数据模型的性能是开发、部署和维护AI系统的基本挑战。模型性能通常使用测试组或定期人工质量评估进行评估,两者都需要人工数据标签。自动化性能预测技术的目的是减轻这一负担,但潜在不准确和对其预测缺乏信任妨碍了其被广泛采用。我们用一种计算模型性能预测间隔的方法来解决业绩预测不确定性这一核心问题。我们的方法是利用转让学习来培训不确定性模型来估计模型性能预测的不确定性。我们评估了我们在各种漂移条件下的做法,并表明比竞争性基线有相当大的改进。我们认为,这一结果使得预测间隔和业绩预测在总体上更加实用,对于现实世界使用来说也更加实用。