在有限的人口抽样中通过随机森林进行模型辅助估计 (Model-assisted estimation through random forests in finite population sampling)

In surveys, the interest lies in estimating finite population parameters such as population totals and means. In most surveys, some auxiliary information is available at the estimation stage. This information may be incorporated in the estimation procedures to increase their precision. In this article, we use random forests to estimate the functional relationship between the survey variable and the auxiliary variables. In recent years, random forests have become attractive as National Statistical Offices have now access to a variety of data sources, potentially exhibiting a large number of observations on a large number of variables. We establish the theoretical properties of model-assisted procedures based on random forests and derive corresponding variance estimators. A model-calibration procedure for handling multiple survey variables is also discussed. The results of a simulation study suggest that the proposed point and estimation procedures perform well in term of bias, efficiency, and coverage of normal-based confidence intervals, in a wide variety of settings. Finally, we apply the proposed methods using data on radio audiences collected by M\'ediam\'etrie, a French audience company.

翻译：在调查中,人们的兴趣在于估计人口总数和手段等有限的人口参数。在大多数调查中,在估计阶段可以得到一些辅助信息,这种信息可以纳入估计程序,以提高其精确度。在本篇文章中,我们使用随机森林来估计调查变量和辅助变量之间的功能关系。近年来,随机森林已经变得有吸引力,因为国家统计局现在可以使用各种数据来源,有可能对大量变量进行大量观察。我们建立了基于随机森林的模型辅助程序的理论属性,并得出相应的差异估计数据。还讨论了处理多种调查变量的模型校准程序。模拟研究的结果表明,拟议的点和估计程序在偏差、效率和基于正常信任间隔的覆盖范围方面,在各种环境中运作良好。最后,我们运用法国受众公司M\'ediam\'etrie收集的电台听众数据。我们应用了拟议方法。