Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating complex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In this work, we adopt a distributional paradigm for GFlowNets, turning each flow function into a distribution, thus providing more informative learning signals during training. By parameterizing each edge flow through their quantile functions, our proposed \textit{quantile matching} GFlowNet learning algorithm is able to learn a risk-sensitive policy, an essential component for handling scenarios with risk uncertainty. Moreover, we find that the distributional approach can achieve substantial improvement on existing benchmarks compared to prior methods due to our enhanced training algorithm, even in settings with deterministic rewards.
翻译:产生流动网络(GFlowNets)是一个由概率抽样者组成的新大家庭,一个代理商通过一系列决策步骤,学习了生成复杂组合结构的随机政策。尽管受到强化学习的启发,但目前的GFlowNet框架的适用性相对有限,无法处理奖励功能的随机性。在这项工作中,我们为GFlowNets采用了一种分配模式,将每个流动函数转化为分布,从而在培训期间提供更丰富的学习信号。通过将每种边缘流动通过其量化功能进行参数化,我们提议的\ textit{quantile match}GFlowNet学习算法能够学习一种对风险敏感的政策,这是处理风险不确定情景的一个基本组成部分。此外,我们发现,分配方法可以大大改进现有基准,而以前由于我们强化了培训算法,即便在具有确定性回报的情况下,也能够改进现有基准。