We consider independent component analysis of binary data. While fundamental in practice, this case has been much less developed than ICA for continuous data. We start by assuming a linear mixing model in a continuous-valued latent space, followed by a binary observation model. Importantly, we assume that the sources are non-stationary; this is necessary since any non-Gaussianity would essentially be destroyed by the binarization. Interestingly, the model allows for closed-form likelihood by employing the cumulative distribution function of the multivariate Gaussian distribution. In stark contrast to the continuous-valued case, we prove non-identifiability of the model with few observed variables; our empirical results imply identifiability when the number of observed variables is higher. We present a practical method for binary ICA that uses only pairwise marginals, which are faster to compute than the full multivariate likelihood.
翻译:我们考虑对二进制数据进行独立的组成部分分析。 在实践上,这个案例在持续数据方面远不如ICA那么发达。 我们首先假设一个线性混合模型在连续价值潜质空间中,然后是二进制观测模型。 重要的是,我们假设来源是非静止的; 这是有必要的, 因为任何非Gaussianity基本上都会被二进制摧毁。 有趣的是, 该模型通过使用多变量 Gaussian 分布的累积分布功能而允许封闭形式的可能性。 与持续价值案例形成鲜明对比的是, 我们证明该模型与少数观察到的变量不易识别; 我们的经验结果意味着在观测到的变量数量较高时可以识别。 我们为二进制ICA提出了一个实用方法,它只使用双进边, 其计算速度比全部多变制可能性快。