在专家混合回归模型中,Lasso人面临1美元和1美元的悬赏不平等 (An $l_1$-oracle inequality for the Lasso in mixture-of-experts regression models)

Mixture-of-experts (MoE) models are a popular framework for modeling heterogeneity in data, for both regression and classification problems in statistics and machine learning, due to their flexibility and the abundance of available statistical estimation and model choice tools. Such flexibility comes from allowing the mixture weights (or gating functions) in the MoE model to depend on the explanatory variables, along with the experts (or component densities). This permits the modeling of data arising from more complex data generating processes when compared to the classical finite mixtures and finite mixtures of regression models, whose mixing parameters are independent of the covariates. The use of MoE models in a high-dimensional setting, when the number of explanatory variables can be much larger than the sample size, is challenging from a computational point of view, and in particular from a theoretical point of view, where the literature is still lacking results for dealing with the curse of dimensionality, for both the statistical estimation and feature selection problems. We consider the finite MoE model with soft-max gating functions and Gaussian experts for high-dimensional regression on heterogeneous data, and its $l_1$-regularized estimation via the Lasso. We focus on the Lasso estimation properties rather than its feature selection properties. We provide a lower bound on the regularization parameter of the Lasso function that ensures an $l_1$-oracle inequality satisfied by the Lasso estimator according to the Kullback--Leibler loss.

翻译：专家混合模型(MoE)是模拟数据差异的流行框架,用于模拟数据的异质性,包括统计和机器学习中的回归和分类问题,因为其灵活性以及现有的统计估计和模型选择工具的丰富性。这种灵活性来自允许教育部模型中的混合权重(或标志功能)与专家(或组成部分密度)一起依赖解释变量。这样可以模拟较复杂的数据生成过程所产生的数据,与传统的有限混合物和回归模型的固定混合物和固定混合物相比,这些模型的混合参数独立于变量。在高维环境中使用MOE模型,因为解释性变量的数量可能大大大于样本规模,从计算角度,特别是从理论角度,这种灵活性来自允许教育部模型的混合权重(或标志功能)取决于解释变量变异变量的诅咒,对于统计估计和特征选择问题而言,文献仍然缺乏结果。我们考虑具有软成体的缩模组模型,并且用高维值专家在变量数据上进行高位回归,而其解释性变量变量数量可能大大大于样本大小的数值,我们通过平价定性估算,我们通过激光1 和标准标定性标定性分析,我们通过平调制的分辨率分析。