In this paper, we propose a convex formulation for learning logistic regression model (logit) with latent heterogeneous effect on sub-population. In transportation, logistic regression and its variants are often interpreted as discrete choice models under utility theory (McFadden, 2001). Two prominent applications of logit models in the transportation domain are traffic accident analysis and choice modeling. In these applications, researchers often want to understand and capture the individual variation under the same accident or choice scenario. The mixed effect logistic regression (mixed logit) is a popular model employed by transportation researchers. To estimate the distribution of mixed logit parameters, a non-convex optimization problem with nested high-dimensional integrals needs to be solved. Simulation-based optimization is typically applied to solve the mixed logit parameter estimation problem. Despite its popularity, the mixed logit approach for learning individual heterogeneity has several downsides. First, the parametric form of the distribution requires domain knowledge and assumptions imposed by users, although this issue can be addressed to some extent by using a non-parametric approach. Second, the optimization problems arise from parameter estimation for mixed logit and the non-parametric extensions are non-convex, which leads to unstable model interpretation. Third, the simulation size in simulation-assisted estimation lacks finite-sample theoretical guarantees and is chosen somewhat arbitrarily in practice. To address these issues, we are motivated to develop a formulation that models the latent individual heterogeneity while preserving convexity, and avoids the need for simulation-based approximation. Our setup is based on decomposing the parameters into a sparse homogeneous component in the population and low-rank heterogeneous parts for each individual.
翻译:在本文中,我们建议为学习物流回归模型(logit)而提出一个混凝土公式,该模型对亚人口具有潜在的不同影响。在运输、物流回归及其变异往往被解释为根据公用事业理论的离散选择模型(McFadden,2001年)。运输领域对逻辑模型的两个突出应用是交通事故分析和选择模型。在这些应用中,研究人员往往希望了解和捕捉同一事故或选择情景下的个别变异。物流回归(混合对账)是一种受欢迎的模型,运输研究人员使用这种模型。估计混合日志参数的分布情况,需要解决具有高维度集成参数的非康韦斯优化问题。基于模拟的优化通常用于解决混合逻辑参数估算问题。尽管它很受欢迎,但用于学习个体异异异性特征的逻辑模型方法存在若干下行。首先,分布的参数形式要求用户提供域知识和假设,但这一问题可以在某种程度上通过非参数化方法加以解决。第二,对于混合逻辑精度精度集集集集集集集集的精度综合精度集问题,对于我们混合逻辑的深度估算提出了精确度模型的精确度估算问题,而对于每种不精确度的精确度估算是非精确度的模型,对于我们进行。