We study the problem of learning a latent variable model from a stream of data. Latent variable models are popular in practice because they can explain observed data in terms of unobserved concepts. These models have been traditionally studied in the offline setting. In the online setting, on the other hand, the online EM is arguably the most popular algorithm for learning latent variable models. Although the online EM is computationally efficient, it typically converges to a local optimum. In this work, we develop a new online learning algorithm for latent variable models, which we call SpectralLeader. SpectralLeader always converges to the global optimum, and we derive a sublinear upper bound on its $n$-step regret in the bag-of-words model. In both synthetic and real-world experiments, we show that SpectralLeader performs similarly to or better than the online EM with tuned hyper-parameters.
翻译:我们研究从数据流中学习潜伏变量模型的问题。 隐性变量模型在实践中很受欢迎, 因为它们可以以未观测到的概念来解释观测到的数据。 这些模型传统上都是在离线环境中研究的。 另一方面, 在在线环境中, 在线 EM 可以说是学习潜伏变量模型的最受欢迎的算法。 虽然在线 EM 具有计算效率, 但它一般会与本地最佳模式相融合。 在这项工作中, 我们为潜伏变量模型开发一种新的在线学习算法, 我们称之为 SpectralLeader 。 频谱性Leader 总是会与全球最佳模式相融合, 我们从字袋模型中取出一个子线性上线的顶端点。 在合成和现实世界实验中, 我们显示 SpectralLeader 的功能与在线 EM 类似或更好, 我们使用调整的超参数。