For massive data stored at multiple machines, we propose a distributed subsampling procedure for the composite quantile regression. By establishing the consistency and asymptotic normality of the composite quantile regression estimator from a general subsampling algorithm, we derive the optimal subsampling probabilities and the optimal allocation sizes under the L-optimality criteria. A two-step algorithm to approximate the optimal subsampling procedure is developed. The proposed methods are illustrated through numerical experiments on simulated and real datasets.
翻译:对于在多个机器中储存的大规模数据,我们建议一个分布式子抽样程序,用于复合微量回归。通过从一般子抽样算法中确定复合微量回归估测器的一致性和无症状的正常性,我们得出了L-最佳标准下的最佳子抽样概率和最佳分配大小。开发了一种两步算法,以接近最佳子取样程序。通过模拟和真实数据集的数值实验来说明拟议方法。