Post-deployment monitoring of the performance of ML systems is critical for ensuring reliability, especially as new user inputs can differ from the training distribution. Here we propose a novel approach, MLDemon, for ML DEployment MONitoring. MLDemon integrates both unlabeled features and a small amount of on-demand labeled examples over time to produce a real-time estimate of the ML model's current performance on a given data stream. Subject to budget constraints, MLDemon decides when to acquire additional, potentially costly, supervised labels to verify the model. On temporal datasets with diverse distribution drifts and models, MLDemon substantially outperforms existing monitoring approaches. Moreover, we provide theoretical analysis to show that MLDemon is minimax rate optimal up to logarithmic factors and is provably robust against broad distribution drifts whereas prior approaches are not.
翻译:对ML系统性能的部署后监测对于确保可靠性至关重要,特别是因为新的用户投入可能不同于培训分布。在这里,我们提议对ML Deptiment Monitoring采用新的方法,即 MLDemamon。MLDema在一段时间内结合了未贴标签的特征和少量点标示例,以便对ML模型在特定数据流上的当前性能进行实时估计。在预算限制的情况下,MLDema决定何时获得额外的、可能费用昂贵的、受监督的标签来核查模型。在具有不同分布漂移和模型的时间数据集方面, MLDemamon大大超越了现有的监测方法。此外,我们提供理论分析,表明MLDemamon是微量速率,与对数系数相比是最佳的,并且对广泛分布流进行强健健健健健健,而以前的办法则不是。