In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such as regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction. For regression and other numerical linear algebra related tasks, we consider the row arrival streaming model. Our results are based on a simple, but powerful, observation that many importance sampling-based algorithms give rise to adversarial robustness which is in contrast to sketching based algorithms, which are very prevalent in the streaming literature but suffer from adversarial attacks. In addition, we show that the well-known merge and reduce paradigm in streaming is adversarially robust. Since the merge and reduce paradigm allows coreset constructions in the streaming setting, we thus obtain robust algorithms for $k$-means, $k$-median, $k$-center, Bregman clustering, projective clustering, principal component analysis (PCA) and non-negative matrix factorization. To the best of our knowledge, these are the first adversarially robust results for these problems yet require no new algorithmic implementations. Finally, we empirically confirm the robustness of our algorithms on various adversarial attacks and demonstrate that by contrast, some common existing algorithms are not robust. (Abstract shortened to meet arXiv limits)
翻译:在本文中,我们为中央机器学习和算法任务引入了对抗性强的流算法,如回归和集群,以及更一般的对应方、子空间嵌入、低级近似和核心元件构建。对于回归和其他数字线性代数相关任务,我们考虑行进量流模式。我们的结果基于一个简单但有力的观察,即许多重要的基于抽样的算法都产生了对抗性强势,这与草图为基础的算法形成对照,这些算法在流文献中非常普遍,但受到对抗性攻击的影响。此外,我们表明,流化中众所周知的合并和减少模式的范式具有对抗性强势。由于合并和减少模式允许在流化环境中进行核心元件构建,因此我们因此为美元-平均值、美元-中间值、美元-中间值、美元-中间值、布雷格曼组合、投影集、主要组成部分分析(PCA)和非反向反向矩阵化因素化。我们最了解的是,这些是这些问题的首个敌对性合并和递减模式。由于合并和缩略性结果,我们最后不需要通过新的演算法来证实一些稳健的典型的反比。