This paper promotes the use of random forests as versatile tools for estimating spatially disaggregated indicators in the presence of small area-specific sample sizes. Small area estimators are predominantly conceptualized within the regression-setting and rely on linear mixed models to account for the hierarchical structure of the survey data. In contrast, machine learning methods offer non-linear and non-parametric alternatives, combining excellent predictive performance and a reduced risk of model-misspecification. Mixed effects random forests combine advantages of regression forests with the ability to model hierarchical dependencies. This paper provides a coherent framework based on mixed effects random forests for estimating small area averages and proposes a non-parametric bootstrap estimator for assessing the uncertainty of the estimates. We illustrate advantages of our proposed methodology using Mexican income-data from the state Nuevo Le\'on. Finally, the methodology is evaluated in model-based and design-based simulations comparing the proposed methodology to traditional regression-based approaches for estimating small area averages.
翻译:本文提倡利用随机森林作为在有小面积特定抽样规模的情况下估计空间分类指标的多种工具。小面积估计者主要在回归定法中概念化,并依靠线性混合模型来计算调查数据的等级结构。相比之下,机器学习方法提供了非线性和非线性和非参数性替代方法,结合了极好的预测性能和减少模型误差的风险。混合效应随机森林结合了回归森林的优势和构建等级依赖性模型的能力。本文提供了一个基于混合效应随机森林的连贯框架,以估算小面积平均数,并提出了非参数性靴子测算器用于评估估计数的不确定性。我们用墨西哥新莱恩州的收入数据说明了我们拟议方法的优点。最后,在模型和设计模拟中评价了该方法,比较了估算小面积平均数的传统回归方法的拟议方法。