Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an underlying network with a related community structure, and proceeds by directly incorporating this into a penalty that is effectively computed via a heat flow-based local network dynamics. In fact, we demonstrate a procedure to construct such a network based on the available data. Notably, we dispense with computationally intensive pre-processing involving clustering of variables, spectral or otherwise. Our technique is underpinned by rigorous theorems that guarantee its effective performance and provide bounds on its sample complexity. In particular, in a wide range of settings, it provably suffices to run the heat flow dynamics for time that is only logarithmic in the problem dimensions. We explore in detail the interfaces of our approach with key statistical physics models in network science, such as the Gaussian Free Field and the Stochastic Block Model. We validate our approach by successful applications to real-world data from a wide array of application domains, including computer science, genetics, climatology and economics. Our work raises the possibility of applying similar diffusion-based techniques to classical learning tasks, exploiting the interplay between geometric, dynamical and stochastic structures underlying the data.
翻译:机器学习问题解释变量的小组或群集结构是一个非常普遍的现象,它吸引了从业人员和理论学家的广泛兴趣。在这项工作中,我们为在这种群落结构下学习的方法提供了一种方法,而这种方法并不要求事先提供关于群落身份的信息。我们的范式的动机是,一个基础网络的拉普拉西亚几何学,具有相关的群落结构,并且通过直接将这一方法纳入一种惩罚,通过基于热流的本地网络动态来有效计算。事实上,我们展示了一种根据现有数据建立这样一个网络的程序。值得注意的是,我们免除了计算密集的预处理,包括变量、光谱或其他组合。我们的技术以严格的理论为基础,保证其有效性,并提供有关其样本复杂性的界限。特别是,在广泛的环境中,这种范式的范式可以用来运行热流动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态动态。我们详细探索了方法与网络科学关键统计物理模型的界面,如高斯自由场和托科模型。我们通过成功的模型来验证我们的方法,我们的方法得到了支持,我们的方法得到了支撑。我们的方法,我们从成功的理论基础应用,我们通过成功的应用,从一个动态的模型,从基于虚拟的模型的模型的模型,我们从成功的模型,将我们从一个虚拟的模型的模型的模型,我们通过成功的应用,将我们的基因学应用,将我们的模型,将我们的模型应用,将我们的模型,将我们的模型,将我们的模型,将我们的模型应用,将我们的方法,将我们的方法应用到我们的方法推到我们从成功的应用,将我们的方法推到我们从一个成功的应用到从成功的模型,从一个成功的模型,将我们从一个虚拟的模型,将我们的模型,将我们的模型,将我们的模型,将我们的模型,将我们的模型,将我们的模型,将我们的模型,将我们的方法推到我们的方法,将我们的模型应用到我们的方法推到我们从成功的应用到我们从一个虚拟的模型应用应用应用应用推到我们从成功的应用推到我们的方法推到我们从成功的应用,从一个虚拟的模型,从一个虚拟的模型应用,从一个虚拟的模型应用,将我们的模型应用,将进入到一个虚拟的模型,从一个虚拟的模型,从一个虚拟的模型,将进入到一个虚拟的模型,从一个