Statistical analysis of large dataset is a challenge because of the limitation of computing devices memory and excessive computation time. Divide and Conquer (DC) algorithm is an effective solution path, but the DC algorithm has some limitations. Empirical likelihood is an important semiparametric and nonparametric statistical method for parameter estimation and statistical inference, and the estimating equation builds a bridge between empirical likelihood and traditional statistical methods, which makes empirical likelihood widely used in various traditional statistical models. In this paper, we propose a novel approach to address the challenges posed by empirical likelihood with massive data, which called split sample mean empirical likelihood(SSMEL). We show that the SSMEL estimator has the same estimation efficiency as the empirical likelihood estimatior with the full dataset, and maintains the important statistical property of Wilks' theorem, allowing our proposed approach to be used for statistical inference. The effectiveness of the proposed approach is illustrated using simulation studies and real data analysis.
翻译:对大型数据集进行统计分析是一项挑战,因为计算设备内存有限,计算时间过长。分化和征服算法是一种有效的解决办法,但DC算法有一些局限性。经验可能性是参数估计和统计推算的一个重要的半参数和非参数统计方法,估计方程式在经验可能性和传统统计方法之间架起了桥梁,使得各种传统统计模型广泛使用经验可能性。在本文中,我们提出了一种新颖的方法来应对大规模数据的经验可能性所带来的挑战,这种数据称为分散抽样平均经验可能性(SSMEL )。我们表明,SSMEL估计值与整个数据集的经验概率估计值具有相同的估计效率,并保持Wilks' theorem的重要统计属性,使我们提出的方法可用于统计推论。用模拟研究和真实数据分析来说明拟议方法的有效性。</s>