Demand forecasting in competitive, uncertain business environments requires models that can integrate multiple evaluation perspectives rather than being restricted to hyperparameter optimization based on a single metric. This traditional approach tends to prioritize one error indicator, which can bias results when metrics provide contradictory signals. In this context, the Hierarchical Evaluation Function (HEF) is proposed as a multi-metric framework for hyperparameter optimization that integrates explanatory power (R2), sensitivity to extreme errors (RMSE), and average accuracy (MAE). The performance of HEF was assessed using four widely recognized benchmark datasets in the forecasting domain: Walmart, M3, M4, and M5. Prediction models were optimized through Grid Search, Particle Swarm Optimization (PSO), and Optuna, and statistical analyses based on difference-of-proportions tests confirmed that HEF delivers superior results compared to a unimetric reference function, regardless of the optimizer employed, with particular relevance for heterogeneous monthly time series (M3) and highly granular daily demand scenarios (M5). The findings demonstrate that HEF improves stability, generalization, and robustness at low computational cost, consolidating its role as a reliable evaluation framework that enhances model selection, enables more accurate demand forecasts, and supports decision-making in dynamic, competitive business environments.
翻译:在竞争激烈、不确定的商业环境中进行需求预测,需要能够整合多种评估视角的模型,而非局限于基于单一指标的参数优化。传统方法往往优先考虑单一误差指标,当不同指标给出矛盾信号时可能导致结果偏差。为此,本文提出分层评估函数作为一种多指标超参数优化框架,该框架综合了模型解释力(R²)、对极端误差的敏感性(RMSE)以及平均精度(MAE)。研究使用预测领域四个广泛认可的基准数据集——Walmart、M3、M4和M5——对HEF的性能进行了评估。通过网格搜索、粒子群优化和Optuna对预测模型进行优化,并基于比例差异检验的统计分析证实:无论采用何种优化器,HEF均优于单指标参考函数,尤其在异质性月度时间序列和高度细粒度的日度需求场景中表现突出。研究结果表明,HEF以较低计算成本提升了模型的稳定性、泛化能力和鲁棒性,巩固了其作为可靠评估框架的地位,能够优化模型选择、实现更精准的需求预测,并为动态竞争的商业环境中的决策提供支持。