模式变化是否?高效评估机器学习API的转变 (Did the Model Change? Efficiently Assessing Machine Learning API Shifts)

Machine learning (ML) prediction APIs are increasingly widely used. An ML API can change over time due to model updates or retraining. This presents a key challenge in the usage of the API because it is often not clear to the user if and how the ML model has changed. Model shifts can affect downstream application performance and also create oversight issues (e.g. if consistency is desired). In this paper, we initiate a systematic investigation of ML API shifts. We first quantify the performance shifts from 2020 to 2021 of popular ML APIs from Google, Microsoft, Amazon, and others on a variety of datasets. We identified significant model shifts in 12 out of 36 cases we investigated. Interestingly, we found several datasets where the API's predictions became significantly worse over time. This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant. Monitoring confusion matrix shifts using standard random sampling can require a large number of samples, which is expensive as each API call costs a fee. We propose a principled adaptive sampling algorithm, MASA, to efficiently estimate confusion matrix shifts. MASA can accurately estimate the confusion matrix shifts in commercial ML APIs using up to 90% fewer samples compared to random sampling. This work establishes ML API shifts as an important problem to study and provides a cost-effective approach to monitor such shifts.

翻译：机器学习(ML)预测 API 越来越多地被广泛使用。 ML API 可以随着时间的变化而变化,因为模型更新或再培训。这在使用 API 方面是一个关键的挑战,因为用户往往不清楚该模式是否以及如何发生变化。模式转变会影响下游应用性能,并造成监督问题(例如,如果需要一致性的话)。在本文中,我们开始对MIP 移动进行系统调查。我们首先量化了从谷歌、微软、亚马逊和其他各种数据集的受欢迎的 ML API 从2020年到2021年的性能变化。我们发现在使用 API 的36个案件中,12个模式发生了重大变化。有趣的是,因为用户往往不清楚该模型是否和如何变化。模型的特点是,我们发现API 的预测随着时间的推移变得更差得多。这促使我们把API 评估问题放在更精细的层次上,以估计API 的混乱矩阵在数据发布时会如何变化。使用标准的随机抽样抽样方法监测混乱矩阵可能要求大量样本,而每次需要花费昂贵的费用。我们发现在36个案例中发现了一些数据集。我们建议采用一个精确的抽样模型,把一个精确的MAA 。我们提出一个比较一个精确的矩阵,将MAA 比较一个精确的抽样矩阵,将一个比较一个精确的模型,将MAA 比较一个比较一个比较。