Generative Flow Networks (GFlowNets) are powerful samplers for compositional objects that, by design, sample proportionally to a given non-negative reward. Nonetheless, in practice, they often struggle to explore the reward landscape evenly: trajectories toward easy-to-reach regions dominate training, while hard-to-reach modes receive vanishing or uninformative gradients, leading to poor coverage of high-reward areas. We address this imbalance with Boosted GFlowNets, a method that sequentially trains an ensemble of GFlowNets, each optimizing a residual reward that compensates for the mass already captured by previous models. This residual principle reactivates learning signals in underexplored regions and, under mild assumptions, ensures a monotone non-degradation property: adding boosters cannot worsen the learned distribution and typically improves it. Empirically, Boosted GFlowNets achieve substantially better exploration and sample diversity on multimodal synthetic benchmarks and peptide design tasks, while preserving the stability and simplicity of standard trajectory-balance training.
翻译:生成流网络(GFlowNets)是针对组合对象的强大采样器,其设计目标是与给定非负奖励成比例地进行采样。然而,在实际应用中,它们往往难以均匀探索奖励空间:易于到达区域的轨迹主导了训练过程,而难以到达的模式则获得趋近于零或无信息的梯度,导致对高奖励区域的覆盖不足。我们通过增强型生成流网络来解决这种不平衡问题,该方法顺序训练一个生成流网络集成,每个网络优化一个残差奖励,以补偿先前模型已捕获的概率质量。这种残差原理在未充分探索的区域重新激活学习信号,并在温和假设下确保单调非退化特性:增加增强器不会恶化已学习分布,通常还能改进它。实验表明,增强型生成流网络在多模态合成基准测试和肽设计任务中实现了显著更好的探索能力和样本多样性,同时保持了标准轨迹平衡训练的稳定性和简洁性。