The literature on provable robustness in machine learning has primarily focused on static prediction problems, such as image classification, in which input samples are assumed to be independent and model performance is measured as an expectation over the input distribution. Robustness certificates are derived for individual input instances with the assumption that the model is evaluated on each instance separately. However, in many deep learning applications such as online content recommendation and stock market analysis, models use historical data to make predictions. Robustness certificates based on the assumption of independent input samples are not directly applicable in such scenarios. In this work, we focus on the provable robustness of machine learning models in the context of data streams, where inputs are presented as a sequence of potentially correlated items. We derive robustness certificates for models that use a fixed-size sliding window over the input stream. Our guarantees hold for the average model performance across the entire stream and are independent of stream size, making them suitable for large data streams. We perform experiments on speech detection and human activity recognition tasks and show that our certificates can produce meaningful performance guarantees against adversarial perturbations.
翻译:机器学习中对可证明鲁棒性的研究主要集中在静态预测问题上,比如图像分类。在这些问题中,输入模型的样本被假定为独立的,并且模型的性能是针对输入分布的期望进行评估的。鲁棒性证明基于对单个输入实例的推断,假设模型是对每个实例单独评估的。然而,在许多深度学习应用程序中(如在线内容推荐和股票市场分析),模型使用历史数据进行预测。基于独立输入样本的鲁棒性证明在这些情况下并不直接适用。在这项工作中,我们关注于数据流环境中机器学习模型的可证明鲁棒性,其中输入呈现为可能相关的项目序列。我们推导了模型使用固定大小滑动窗口的可证明鲁棒性证书。我们的保障适用于整个流的平均模型性能,并且与流大小无关,因此适用于大型数据流。我们在语音检测和人类活动识别任务上进行实验,并展示我们的证书可以对抗对抗性扰动产生有实际意义的性能保证。