Outliers widely occur in big-data applications and may severely affect statistical estimation and inference. In this paper, a framework of outlier-resistant estimation is introduced to robustify an arbitrarily given loss function. It has a close connection to the method of trimming and includes explicit outlyingness parameters for all samples, which in turn facilitates computation, theory, and parameter tuning. To tackle the issues of nonconvexity and nonsmoothness, we develop scalable algorithms with implementation ease and guaranteed fast convergence. In particular, a new technique is proposed to alleviate the requirement on the starting point such that on regular datasets, the number of data resamplings can be substantially reduced. Based on combined statistical and computational treatments, we are able to perform nonasymptotic analysis beyond M-estimation. The obtained resistant estimators, though not necessarily globally or even locally optimal, enjoy minimax rate optimality in both low dimensions and high dimensions. Experiments in regression, classification, and neural networks show excellent performance of the proposed methodology at the occurrence of gross outliers.
翻译:大数据应用中广泛出现外部线,可能会严重影响统计估计和推算。在本文中,引入了抗外部估计框架,以巩固任意设定的损失功能。它与三联法密切相关,包括所有样品的明显偏差参数,这反过来又有利于计算、理论和参数调控。为了解决非兼容性和非移动性问题,我们开发了可伸缩的算法,便于实施,并保证快速趋同。特别是,提出了新的技术,以缓解起点的需求,如常规数据集中的数据抽样数量可以大大减少。基于综合统计和计算处理,我们能够进行非随机分析,这又能促进计算、理论和参数调控。获得的抗偏差者虽然不一定全球甚至地方最佳,但在低尺寸和高尺寸方面都享有微缩速率最佳性。回归、分类和神经网络实验显示,在出现毛值外径时,拟议的方法表现良好。