Online Matrix Factorization (OMF) is a fundamental tool for dictionary learning problems, giving an approximate representation of complex data sets in terms of a reduced number of extracted features. Convergence guarantees for most of the OMF algorithms in the literature assume independence between data matrices, and the case of dependent data streams remains largely unexplored. In this paper, we show that a non-convex generalization of the well-known OMF algorithm for i.i.d. stream of data in \citep{mairal2010online} converges almost surely to the set of critical points of the expected loss function, even when the data matrices are functions of some underlying Markov chain satisfying a mild mixing condition. This allows one to extract features more efficiently from dependent data streams, as there is no need to subsample the data sequence to approximately satisfy the independence assumption. As the main application, by combining online non-negative matrix factorization and a recent MCMC algorithm for sampling motifs from networks, we propose a novel framework of Network Dictionary Learning, which extracts ``network dictionary patches' from a given network in an online manner that encodes main features of the network. We demonstrate this technique and its application to network denoising problems on real-world network data.
翻译:在线矩阵参数化(OMF)是字典学习问题的基本工具,它以较少的抽取功能表示复杂数据集的大致代表性。文献中大多数OMF算法的一致保证假定数据矩阵的独立性,而依赖数据流的情况基本上尚未探索。在本文中,我们显示,对i.d. 的已知 OMF算法流的非混凝土化概括化,在\citep{mairal2010online} 中的数据流中几乎肯定地会汇集到预期损失函数的临界点组,即使数据矩阵是某些基本马尔科夫链的功能,满足了温和混合条件。这样可以更有效地从依赖数据流中提取特征,因为不需要对数据序列进行子集,以大致满足独立假设。作为主要应用,我们通过将在线非负式矩阵化和最近用于从网络中取样 motifs 的MC 算法结合,我们提出了一个网络词变数学新框架,从一个主网络中提取 & 网络词典补码学的功能,在网络上演示了网络上的实际技术。