The M5 competition uncertainty track aims for probabilistic forecasting of sales of thousands of Walmart retail goods. We show that the M5 competition data faces strong overdispersion and sporadic demand, especially zero demand. We discuss resulting modeling issues concerning adequate probabilistic forecasting of such count data processes. Unfortunately, the majority of popular prediction methods used in the M5 competition (e.g. lightgbm and xgboost GBMs) fails to address the data characteristics due to the considered objective functions. The distributional forecasting provides a suitable modeling approach for to the overcome those problems. The GAMLSS framework allows flexible probabilistic forecasting using low dimensional distributions. We illustrate, how the GAMLSS approach can be applied for the M5 competition data by modeling the location and scale parameter of various distributions, e.g. the negative binomial distribution. Finally, we discuss software packages for distributional modeling and their drawback, like the R package gamlss with its package extensions, and (deep) distributional forecasting libraries such as TensorFlow Probability.
翻译:M5竞争不确定性轨道旨在对成千上万沃尔玛零售货物的销售进行概率预测。我们表明,M5竞争数据面临强烈的过度分散和零星需求,特别是零需求。我们讨论了由此而来的关于对此类计算数据过程进行适当概率预测的模型问题。不幸的是,M5竞争中使用的大多数流行预测方法(如光gbm和xgboust GBMss)由于考虑的客观功能而未能解决数据特征问题。分配预测为克服这些问题提供了一个适当的模型方法。GAMLSS框架允许使用低维分布进行灵活的概率预测。我们通过模拟各种分销的地点和规模参数,例如负双球分布。最后,我们讨论了分布模型的软件包及其背包扩展,以及(深入)TensorFlow Probbable等分销预测图书馆。