This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.
翻译:本文用矩阵系数处理将二进制数据分解的问题。 我们认为平均平衡的Bernoulli模型是一组基因模型,非常适合模拟二进制数据,并能解释各种因素。我们将Bernoulli参数考虑在内,并在进一步改进模型的表达力的因素之一之前考虑另外的Beta值。虽然文献中提出了类似的模型,但它们只是利用Beta作为代用物,以确保贝叶西亚环境中有效的Bernoulli参数;在实践中,它减少为一种统一或非信息规范的先行模式。此外,这些模型的估算侧重于昂贵的Bayesian推论。在本文中,我们提出了一种简单而非常有效的主要化-最小化算法,以尽量进行后世估计。我们的方法利用了以前的Beta参数来改进矩阵完成任务的业绩。在三个公开的二进制数据集上进行的实验表明,我们的方法在预测性、计算复杂性和可解释性之间提供了极好的利弊。