This paper proposes small area estimation methods that utilize generalized tree-based machine learning techniques to improve the estimation of disaggregated means in small areas using discrete survey data. Specifically, we present two approaches based on random forests: the Generalized Mixed Effects Random Forest (GMERF) and a Mixed Effects Random Forest (MERF), both tailored to address challenges associated with count outcomes, particularly overdispersion. Our analysis reveals that the MERF, which does not assume a Poisson distribution to model the mean behavior of count data, excels in scenarios of severe overdispersion. Conversely, the GMERF performs best under conditions where Poisson distribution assumptions are moderately met. Additionally, we introduce and evaluate three bootstrap methodologies - one parametric and two non-parametric - designed to assess the reliability of point estimators for area-level means. The effectiveness of these methodologies is tested through model-based (and design-based) simulations and applied to a real-world dataset from the state of Guerrero in Mexico, demonstrating their robustness and potential for practical applications.
翻译:暂无翻译