The development of modern technology has enabled data collection of unprecedented size, which poses new challenges to many statistical estimation and inference problems. This paper studies the maximum score estimator of a semi-parametric binary choice model under a distributed computing environment without pre-specifying the noise distribution. An intuitive divide-and-conquer estimator is computationally expensive and restricted by a non-regular constraint on the number of machines, due to the highly non-smooth nature of the objective function. We propose (1) a one-shot divide-and-conquer estimator after smoothing the objective to relax the constraint, and (2) a multi-round estimator to completely remove the constraint via iterative smoothing. We specify an adaptive choice of kernel smoother with a sequentially shrinking bandwidth to achieve the superlinear improvement of the optimization error over the multiple iterations. The improved statistical accuracy per iteration is derived, and a quadratic convergence up to the optimal statistical error rate is established. We further provide two generalizations to handle the heterogeneity of datasets with covariate shift and high-dimensional problems where the parameter of interest is sparse.
翻译:现代技术的发展使得数据收集工作达到了前所未有的规模,对许多统计估计和推论问题提出了新的挑战。本文研究分布式计算环境中半参数二进制选择模型的最大分数估计器,而没有预先探明噪音分布。直观的分化和征服估计器在计算上成本很高,而且由于目标功能高度非移动性质,机器数量受到非定期限制的限制,因此受到限制。我们提议(1) 在平滑目标以放松限制之后,提供一个一分一分一等的分数估计器;(2) 多轮估计器,以通过迭接平滑来完全消除限制。我们具体规定对内核滑动的适应性选择,其带宽,按顺序缩小带宽,以便实现对多重迭代差误的超线性改进。通过迭代计算出一个更高的统计精确度,并确定了与最佳统计误差率的四端趋一致。我们进一步提供了两个概括性,以便处理数据设置的变换和高维度参数所引起兴趣的高度问题。