Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t.~a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based Transformers), with two instantiations: a simple version with linear complexity for prohibitive instance numbers, and an advanced version for learning complex structures. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks, such as node classification on large graphs, semi-supervised image/text classification, and spatial-temporal dynamics prediction.
翻译:真实世界中的数据生成常常涉及到实例之间的复杂相互依赖关系,违反了标准学习范例中的独立同分布数据假设,因此为了学习所需的实例表示,需要揭示几何结构。基于此,我们引入了一种能量约束的扩散模型,该模型将来自数据集的一批实例编码到逐渐涵盖其他实例信息的进化状态中,其相互作用方式促进了扩散过程。扩散过程受到关于基于潜在结构的实例表示的全局一致性的原则能量函数内在的限制。我们提供了严格的理论,它暗示了任意实例对之间最优扩散强度的闭式估计,这产生了一类新的神经编码器,即DIFFormer(diffusion-based Transformers)。DIFFormer有两个版本:一种用于处理禁止性实例数量的简单版本,复杂度为线性。另一种版本则适用于学习复杂结构。实验突出了我们的模型作为通用编码器骨干的广泛适用性,其在各种任务中均表现出卓越的性能,例如大型图的节点分类,半监督的图像/文本分类和时空动态预测。