While most work on evaluating machine learning (ML) models focuses on computing accuracy on batches of data, tracking accuracy alone in a streaming setting (i.e., unbounded, timestamp-ordered datasets) fails to appropriately identify when models are performing unexpectedly. In this position paper, we discuss how the nature of streaming ML problems introduces new real-world challenges (e.g., delayed arrival of labels) and recommend additional metrics to assess streaming ML performance.
翻译:虽然评价机器学习(ML)模型的多数工作侧重于对数据批次的准确性进行计算,但仅仅在流成环境中追踪准确性(即无限制、时间戳定的数据集)无法适当确定模型何时出乎意料地运行。 在本立场文件中,我们讨论了流成ML问题的性质如何带来新的现实世界挑战(如标签延迟到来),并提出评估流成ML性能的其他衡量标准。