We present a novel, alternative framework for learning generative models with goal-conditioned reinforcement learning. We define two agents, a goal conditioned agent (GC-agent) and a supervised agent (S-agent). Given a user-input initial state, the GC-agent learns to reconstruct the training set. In this context, elements in the training set are the goals. During training, the S-agent learns to imitate the GC-agent while remaining agnostic of the goals. At inference we generate new samples with the S-agent. Following a similar route as in variational auto-encoders, we derive an upper bound on the negative log-likelihood that consists of a reconstruction term and a divergence between the GC-agent policy and the (goal-agnostic) S-agent policy. We empirically demonstrate that our method is able to generate diverse and high quality samples in the task of image synthesis.
翻译:我们提出了一种新颖的、使用目标条件强化学习学习生成性模型的方法。我们定义了两个智能体,一个是目标条件代理 (GC-agent),一个是监督代理 (S-agent)。在给定用户输入的初始状态的情况下,GC-agent 学习重构训练集。在这个背景下,训练集中的元素是目标。在训练期间,S-agent 学习模仿 GC-agent,但对目标的特征保持不知情。在推理的过程中,我们使用 S-agent 生成新样本。 在运用模型变分自编码器的类似转化方法的基础上,我们得出了负对数似然的上界,它由重构项和 GC-agent 策略与(对目标不知情的) S-agent 策略之间的差异组成。我们通过实验证明,我们的方法在图像合成任务中能够生成多样化和高质量的样本。