Many forecasting applications have a limited distributed target variable, which is zero for most observations and positive for the remaining observations. In the econometrics literature, there is much research about statistical model building for limited distributed target variables. Especially, there are two component model approaches, where one model is build for the probability of the target to be positive and one model for the actual value of the target, given that it is positive. However, the econometric literature focuses on effect estimation and does not provide theory for predictive modeling. Nevertheless, some concepts like the two component model approach and Heckmann's sample selection correction also appear in the predictive modeling literature, without a sound theoretical foundation. In this paper, we theoretically analyze predictive modeling for limited dependent variables and derive best practices. By analyzing various real-world data sets, we also use the derived theoretical results to explain which predictive modeling approach works best on which application.
翻译:许多预测应用程序具有有限的分布式目标变量,其中大多数观察值为零,剩余观察值为正。在计量经济学文献中,有关有限分布目标变量的统计模型构建有很多研究。特别是,有两个组件模型方法,其中一个模型用于目标为正的概率,另一个模型用于目标变量的实际值,假设目标变量为正。然而,计量经济学文献重点关注的是效应估计,没有为预测建模提供理论。尽管如此,在预测建模文献中也出现了一些概念,如两个组件模型方法和 Heckmann 的样本选择校正,但没有坚实的理论基础。在本文中,我们对有限依赖变量的预测建模进行了理论分析,并推导出最佳实践。通过分析各种真实世界的数据集,我们还使用推导出的理论结果来解释哪种预测建模方法在哪种应用中效果最好。