Mixture of experts (MoE) has a well-principled finite mixture model construction for prediction, allowing the gating network (mixture weights) to learn from the predictors (explanatory variables) together with the experts' network (mixture component densities). We investigate the estimation properties of MoEs in a high-dimensional setting, where the number of predictors is much larger than the sample size, for which the literature lacks computational and especially theoretical results. We consider the class of finite MoE models with softmax gating functions and Gaussian regression experts, and focus on the theoretical properties of their $l_1$-regularized estimation via the Lasso. We provide a lower bound on the regularization parameter of the Lasso penalty that ensures an $l_1$-oracle inequality is satisfied by the Lasso estimator according to the Kullback--Leibler loss. We further state an $l_1$-ball oracle inequality for the $l_1$-penalized maximum likelihood estimator from the model selection.
翻译:专家混合(MoE) 具有一种用于预测的有原则的有限混合物模型构建,使Gateting 网络(混合重量)能够与专家网络(混合成分密度)一起学习预测器(解释变量)和专家网络(混合成分密度)的理论属性。我们调查了高维环境中教育部的估计属性,因为预测器的数量远大于样本大小,而文献缺乏计算,特别是理论结果。我们考虑了带有软负负重功能和高斯回归专家的限定的MOE模型类别,并侧重于其通过Lasso定期估算值的理论属性。我们对Lasso罚款的正规化参数设定了一个较低的约束,该参数确保了1美元或1美元的悬浮值不平等由Lasso估量器根据Kullback-Leeper的损失而满足。我们进一步说明模型选择的1美元至1美元受约束的最大可能性估算器。