We study the problem of learning a latent variable model from a stream of data. Latent variable models are popular in practice because they can explain observed data in terms of unobserved concepts. These models have been traditionally studied in the offline setting. The online EM is arguably the most popular algorithm for learning latent variable models online. Although it is computationally efficient, it typically converges to a local optimum. In this work, we develop a new online learning algorithm for latent variable models, which we call SpectralLeader. SpectralLeader always converges to the global optimum, and we derive a $O(\sqrt{n})$ upper bound up to log factors on its $n$-step regret in the bag-of-words model. We show that SpectralLeader performs similarly to or better than the online EM with tuned hyper-parameters, in both synthetic and real-world experiments.
翻译:我们研究从数据流中学习潜伏变量模型的问题。 隐性变量模型在实践中很受欢迎, 因为它们可以以未观测到的概念来解释观测到的数据。 这些模型传统上都是在离线设置中研究的。 在线 EM 可以说是在线学习潜伏变量模型的最受欢迎的算法。 虽然它具有计算效率, 但通常会与本地最佳模式相融合。 在这项工作中, 我们为潜伏变量模型开发一种新的在线学习算法, 我们称之为“ 光谱” 。 光谱Leader 总是会与全球最佳模式相融合, 我们从组合和现实世界实验中, 我们从它的一个值( $O ) 上拉链到字框模型中以一元为级的记录因数。 我们显示, SpectralLeader 在合成和真实世界实验中, 运行的超参数与在线EM 类似或更好。