The methodology discussed in this paper aims to enhance choice models' comprehensiveness and explanatory power for forecasting choice outcomes. To achieve these, we have developed a data-driven method that leverages machine learning procedures for identifying the most effective representation of variables in mode choice empirical probability specifications. The methodology will show its significance, particularly in the face of big data and an abundance of variables where it can search through many candidate models. Furthermore, this study will have potential applications in transportation planning and policy-making, which will be achieved by introducing a sparse identification method that looks for the sparsest specification ( parsimonious model ) in the domain of candidate functions. Finally, this paper applies the method to synthetic choice data as a proof of concept. We perform two experiments and show that if the functional form used to generate the synthetic data lies in the domain of base functions, the methodology can recover that. Otherwise, the method will raise a red flag by outputting small coefficients ( near zero ) for base functions.
翻译:本文讨论的方法旨在增强选择模型的全面性和解释力,以预测选择结果。为了实现这些目标,我们开发了一种数据驱动的方法,利用机器学习程序来识别模式选择实证概率规范中变量的最有效表示。该方法在大数据和变量丰富的情况下展现了其重要性,特别是它可以搜索许多候选模型。此外,本研究将在交通规划和政策制定方面具有潜在的应用,通过引入一个稀疏识别方法,在候选函数域中寻找最简单的规范(简约模型)。最后,本文将该方法应用于合成的选择数据作为概念验证。我们进行了两个实验,并表明,如果用于生成合成数据的函数形式属于基函数域,则该方法可以恢复该函数;否则,该方法将通过为基准函数输出小系数(接近零)来发出警告。