Monitoring machine learning systems post deployment is critical to ensure the reliability of the systems. Particularly importance is the problem of monitoring the performance of machine learning systems across all the data subgroups (subpopulations). In practice, this process could be prohibitively expensive as the number of data subgroups grows exponentially with the number of input features, and the process of labelling data to evaluate each subgroup's performance is costly. In this paper, we propose an efficient framework for monitoring subgroup performance of machine learning systems. Specifically, we aim to find the data subgroup with the worst performance using a limited number of labeled data. We mathematically formulate this problem as an optimization problem with an expensive black-box objective function, and then suggest to use Bayesian optimization to solve this problem. Our experimental results on various real-world datasets and machine learning systems show that our proposed framework can retrieve the worst-performing data subgroup effectively and efficiently.
翻译:安装之后的机器学习系统对于确保系统可靠性至关重要,尤其重要的是监测所有数据分组(子群)的机器学习系统绩效的问题。在实践中,这一过程可能非常昂贵,因为数据分组的数量随着输入功能的增多而成倍增长,而用于评价每个分组的绩效的标签数据过程成本很高。在本文件中,我们提出了一个高效的框架,用于监测机器学习系统的分组性能。具体地说,我们的目标是利用数量有限的标签数据找到业绩最差的数据分组。我们用数学将这一问题描述成一个费用昂贵的黑盒目标功能的优化问题,然后建议利用贝叶斯优化来解决这一问题。我们在各种真实世界数据集和机器学习系统的实验结果显示,我们提议的框架可以有效和高效地检索最差的数据分组。