Log and square root transformations of target variable are routinely used in forecasting models to predict future sales. These transformations often lead to better performing models. However, they also introduce a systematic negative bias (under-forecasting). In this paper, we demonstrate the existence of this bias, dive deep into its root cause and introduce two methods to correct for the bias. We conclude that the proposed bias correction methods improve model performance (by up to 50%) and make a case for incorporating bias correction in modeling workflow. We also experiment with `Tweedie' family of cost functions which circumvents the transformation bias issue by modeling directly on sales. We conclude that Tweedie regression gives the best performance so far when modeling on sales making it a strong alternative to working with a transformed target variable.
翻译:目标变量的日志和平方根转换通常用于预测预测未来销售量的预测模型。这些转换往往导致更好的模型。但是,这些转换还引入了系统性的负面偏差(预测不足 ) 。 在本文中,我们展示了这种偏差的存在,深挖其根本原因,并引入了两种纠正偏差的方法。我们的结论是,拟议的偏差纠正方法提高了模型性能(高达50%),并论证了将偏差纠正纳入模拟工作流程的理由。我们还试验了“Tweedie”成本函数的“Tweedie”系列,它通过直接模拟销售来绕过转换偏差问题。我们的结论是,Tweedie回归在模拟销售时提供了迄今为止最佳的性能,使它成为与变换目标变量合作的有力替代。