In many modern regression applications, the response consists of multiple categorical random variables whose probability mass is a function of a common set of predictors. In this article, we propose a new method for modeling such a probability mass function in settings where the number of response variables, the number of categories per response, and the dimension of the predictor are large. We introduce a latent variable model which implies a low-rank tensor decomposition of the conditional probability tensor. This model is based on the connection between the conditional independence of responses, or lack thereof, and the rank of their conditional probability tensor. Conveniently, our model can be interpreted in terms of a mixture of regressions and can thus be fit using maximum likelihood. We derive an efficient and scalable penalized expectation maximization algorithm to fit this model and examine its statistical properties. We demonstrate the encouraging performance of our method through both simulation studies and an application to modeling the functional classes of genes.
翻译:在许多现代回归应用中,反应由多个绝对随机变量组成,其概率质量是一组共同预测器的函数。在本条中,我们提出一种新的方法,在响应变量的数量、每个响应的类别数量和预测器的维度都很大的情况下,模拟这种概率质量功能。我们引入了一个潜伏变量模型,这意味着条件概率微值的低临界分解。这个模型基于反应的有条件独立性(或没有反应)与其条件概率强值的等级之间的联系。简便地说,我们的模型可以被解读为倒退的混合体,从而能够尽可能地适应。我们形成了一个高效和可扩展的、可受罚的预期最大化算法,以适应这一模型并检查其统计特性。我们通过模拟研究和用于模拟基因功能性类别模型,展示了我们方法的鼓励性表现。