Subsampling is a computationally effective approach to extract information from massive data sets when computing resources are limited. After a subsample is taken from the full data, most available methods use an inverse probability weighted objective function to estimate the model parameters. This type of weighted estimator does not fully utilize information in the selected subsample. In this paper, we propose to use the maximum sampled conditional likelihood estimator (MSCLE) based on the sampled data. We established the asymptotic normality of the MSCLE and prove that its asymptotic variance covariance matrix is the smallest among a class of asymptotically unbiased estimators, including the inverse probability weighted estimator. We further discuss the asymptotic results with the L-optimal subsampling probabilities and illustrate the estimation procedure with generalized linear models. Numerical experiments are provided to evaluate the practical performance of the proposed method.
翻译:子抽样是一种在计算资源有限时从大型数据集中提取信息的计算有效方法。 在从全部数据中提取子样本后,大多数可用方法使用反概率加权客观函数来估计模型参数。这种加权估算器没有充分利用选定子样本中的信息。在本文中,我们提议根据抽样数据使用最大抽样的有条件有条件概率估计器(MSCLE)。我们建立了MSCLE的无症状常态,并证明其无症状差异共变矩阵是Asymoutical公正估计器类别中最小的,包括反概率加权估计器。我们进一步与L-最佳次抽样概率讨论无症状结果,并以通用线性模型来说明估计程序。我们提供了数字实验,以评价拟议方法的实际表现。