Statistical analysis of large datasets is a challenge because of the limitation of computing devices' memory and excessive computation time. Divide and Conquer (DC) algorithm is an effective solution path, but the DC algorithm still has limitations for statistical inference. Empirical likelihood is an important semiparametric and nonparametric statistical method for parameter estimation and statistical inference, and the estimating equation builds a bridge between empirical likelihood and traditional statistical methods, which makes empirical likelihood widely used in various traditional statistical models. In this paper, we propose a novel approach to address the challenges posed by empirical likelihood with massive data, which is called split sample mean empirical likelihood(SSMEL), our approach provides a unique perspective for sovling big data problem. We show that the SSMEL estimator has the same estimation efficiency as the empirical likelihood estimator with the full dataset, and maintains the important statistical property of Wilks' theorem, allowing our proposed approach to be used for statistical inference. The effectiveness of the proposed approach is illustrated using simulation studies and real data analysis.
翻译:分析大型数据集是一项挑战,因为计算设备的存储和计算能力受到限制。虽然分而治之(DC)算法是一种有效的解决方案,但 DC 算法对于统计推断仍存在局限性。经验似然是一种重要的半参数和非参数统计方法,用于参数估计和统计推断。估计方程建立了经验似然与传统统计方法之间的联系,使经验似然在各种传统统计模型中得到广泛应用。在本文中,我们提出了一种新的方法,称为分割样本均值经验似然法(SSMEL),以解决大数据下的经验似然挑战,我们的方法提供了解决大数据问题的独特视角。我们展示了 SSMEL 估计器与全数据集的经验似然估计器具有相同的估计效率,并保持了威尔克斯定理的重要统计属性,允许我们的方法用于统计推断。我们使用模拟研究和实际数据分析说明了该方法的有效性。