Bagging is a useful method for large-scale statistical analysis, especially when the computing resources are very limited. We study here the asymptotic properties of bagging estimators for $M$-estimation problems but with massive datasets. We theoretically prove that the resulting estimator is consistent and asymptotically normal under appropriate conditions. The results show that the bagging estimator can achieve the optimal statistical efficiency, provided that the bagging subsample size and the number of subsamples are sufficiently large. Moreover, we derive a variance estimator for valid asymptotic inference. All theoretical findings are further verified by extensive simulation studies. Finally, we apply the bagging method to the US Airline Dataset to demonstrate its practical usefulness.
翻译:袋装是一种可用于大规模统计分析的有效方法,特别是当计算资源非常有限时。我们在此研究了用于带有大型数据集的M-估计问题的袋装估计器的渐近性质。我们在适当条件下理论证明了得到的估计器是一致和渐近正态的。结果表明,只要袋装子样本大小和子样本数量足够大,袋装估计器就可以达到最优的统计效率。此外,我们为有效的渐近推断导出了一个方差估计器。通过广泛的仿真研究进一步验证了所有的理论发现。最后,我们将袋装方法应用于美国航空数据集以展示其实用性。