Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectification strategy cannot be applied due to the stochasticity of discrete paths, necessitating alternative methods to minimize state transitions. We propose a dynamic-optimal-transport-like minimization objective and derive its Kantorovich formulation for discrete flows with convex interpolants, where transport cost depends solely on inter-state similarity and can be optimized via minibatch strategies. In the case of bag-of-words (BoW) sourced flows, we show that such methods can reduce the number of transitions up to 8 times (1024 to 128) to reach the same generative perplexity without compromising diversity. Additionally, path nondeterminism in discrete flows precludes an instantaneous change-of-variables analogue, preventing precise probability estimation available to continuous flows. We therefore propose two upper bounds on perplexity, enabling principled training, evaluation and model comparison. Finally, we introduce Multimask Flows which outperform masked flows in generative perplexity, particularly when utilizing minibatch Optimal Transport, without sacrificing diversity.
翻译:离散流匹配是近期用于建模分类数据的框架,其性能已与自回归模型相当。然而,与连续流匹配不同,由于离散路径的随机性,校正策略无法应用,因此需要替代方法来最小化状态转移。我们提出了一种类动态最优传输的最小化目标,并推导了其具有凸插值的离散流的Kantorovich表述,其中传输成本仅取决于状态间相似性,并可通过小批量策略进行优化。对于词袋(BoW)源流的情况,我们证明此类方法可将达到相同生成困惑度所需的转移次数减少高达8倍(从1024次降至128次),且不损害多样性。此外,离散流中的路径非确定性排除了瞬时变量变换的类比,使得连续流中可用的精确概率估计无法实现。因此,我们提出了两个困惑度上界,从而支持基于原则的训练、评估和模型比较。最后,我们引入了多掩码流,其在生成困惑度上优于掩码流,特别是在使用小批量最优传输时,且不牺牲多样性。