Subsampling is a computationally effective approach to extract information from massive data sets when computing resources are limited. After a subsample is taken from the full data, most available methods use an inverse probability weighted (IPW) objective function to estimate the model parameters. The IPW estimator does not fully utilize the information in the selected subsample. In this paper, we propose to use the maximum sampled conditional likelihood estimator (MSCLE) based on the sampled data. We established the asymptotic normality of the MSCLE and prove that its asymptotic variance covariance matrix is the smallest among a class of asymptotically unbiased estimators, including the IPW estimator. We further discuss the asymptotic results with the L-optimal subsampling probabilities and illustrate the estimation procedure with generalized linear models. Numerical experiments are provided to evaluate the practical performance of the proposed method.
翻译:子抽样是一种在计算资源有限时从大型数据集中提取信息的计算有效方法。在从全部数据中提取一个子抽样后,大多数可用方法使用反概率加权(IPW)客观函数来估计模型参数。IPW估计器没有充分利用选定子抽样中的信息。在本文中,我们提议根据抽样数据使用最大抽样的有条件有条件估计器(MSCLE)。我们建立了MSCLE的无症状常态,并证明其无症状差异变量矩阵是包括 IPW 估计器在内的无症状公正估计器类别中最小的。我们与L-最优化子抽样概率进一步讨论了无症状结果,并以通用线性模型来说明估计程序。我们提供了数值实验,以评价拟议方法的实际表现。