Reliable estimators of the spatial distribution of socio-economic indicators are essential for evidence-based policy-making. As sample sizes are small for highly disaggregated domains, the accuracy of the direct estimates is reduced. To overcome this problem small area estimation approaches are promising. In this work we propose a small area methodology using machine learning methods. The semi-parametric framework of mixed effects random forest combines the advantages of random forests (robustness against outliers and implicit model-selection) with the ability to model hierarchical dependencies. Existing random forest-based methods require access to auxiliary information on population-level. We present a methodology that deals with the lack of population micro-data. Our strategy adaptively incorporates aggregated auxiliary information through calibration-weights - based on empirical likelihood - for the estimation of area-level means. In addition to our point estimator, we provide a non-parametric bootstrap estimator measuring its uncertainty. The performance of the proposed point estimator and its uncertainty measure is studied in model-based simulations. Finally, the proposed methodology is applied to the $2011$ Socio-Economic Panel and aggregate census information from the same year to estimate the average opportunity cost of care work for $96$ regional planning regions in Germany.
翻译:社会经济指标空间分布的可靠估计数据对于循证决策至关重要。由于抽样规模对于高度分类的领域来说很小,直接估计的准确性会降低。为了克服这一问题,直接估计的准确性会降低。为了克服这一问题,小地区估计方法是很有希望的。在这项工作中,我们提议采用机械学习方法,采用小型地区方法;随机森林混合效应的半参数框架将随机森林的优势(对外部的野生和隐含的模式选择)与按等级划分依赖关系的能力结合起来。现有的随机森林方法需要获得人口水平的辅助信息。我们提出了一个处理人口微观数据缺乏问题的方法。我们的战略是适应性地纳入通过校准加权(根据经验可能性)汇总的辅助信息,用于估计地区一级手段。除了我们的点估测之外,我们还提供非参数性靴杆测测测测测算器测量器测量器测量其不确定性。在模型模拟中研究了拟议的点测算器及其不确定性测量仪的性能。最后,拟议方法适用于2011美元的社会经济小组和从德国平均成本估算数到从同一年的区域平均成本估算。