We pursue tractable Bayesian analysis of generalized linear models (GLMs) for categorical data. Thus far, GLMs are difficult to scale to more than a few dozen categories due to non-conjugacy or strong posterior dependencies when using conjugate auxiliary variable methods. We define a new class of GLMs for categorical data called categorical-from-binary (CB) models. Each CB model has a likelihood that is bounded by the product of binary likelihoods, suggesting a natural posterior approximation. This approximation makes inference straightforward and fast; using well-known auxiliary variables for probit or logistic regression, the product of binary models admits conjugate closed-form variational inference that is embarrassingly parallel across categories and invariant to category ordering. Moreover, an independent binary model simultaneously approximates multiple CB models. Bayesian model averaging over these can improve the quality of the approximation for any given dataset. We show that our approach scales to thousands of categories, outperforming posterior estimation competitors like Automatic Differentiation Variational Inference (ADVI) and No U-Turn Sampling (NUTS) in the time required to achieve fixed prediction quality.
翻译:对普通线性模型(GLMS)进行可移植的巴伊西亚分析,以获得绝对数据。迄今为止,GLMS在使用同源辅助变量方法时,由于非共性或强烈的后背依赖性,很难推广到几十个以上类别。我们为被称为绝对二元(CB)模型的绝对数据定义了一种新的GLMs类别。每个CB模型都有可能受二进制可能性产品的约束,并暗示自然的远洋近似。这个近似直截了当和快速地作出推论;使用众所周知的推理或物流回归的辅助变量,二进模型的产物接受不同类别之间和不同类别之间令人尴尬地平行的封闭形式变异推论。此外,一个独立的双进模型同时接近多个CBB模型。这些平均的Bayesian模型可以提高任何给定数据集的近似质量。我们指出,我们的方法规模是千个类别,表现得超前估计竞争者,如自动差异回溯度回溯度(ADVI)和UTuns-Tunsal sal sal silviewing sal silviewing sal sal silviewing) imviewal.