Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet sampling from them generally requires an iterative procedure such as blocked Gibbs-sampling that may require many steps to draw samples from the joint distribution $p(x, z)$. We propose a novel approach to learning the joint distribution between the data and a latent code which uses an adversarially learned iterative procedure to gradually refine the joint distribution, $p(x, z)$, to better match with the data distribution on each step. GibbsNet is the best of both worlds both in theory and in practice. Achieving the speed and simplicity of a directed latent variable model, it is guaranteed (assuming the adversarial game reaches the virtual training criteria global minimum) to produce samples from $p(x, z)$ with only a few sampling iterations. Achieving the expressiveness and flexibility of an undirected latent variable model, GibbsNet does away with the need for an explicit $p(z)$ and has the ability to do attribute prediction, class-conditional generation, and joint image-attribute modeling in a single model which is not trained for any of these specific tasks. We show empirically that GibbsNet is able to learn a more complex $p(z)$ and show that this leads to improved inpainting and iterative refinement of $p(x, z)$ for dozens of steps and stable generation without collapse for thousands of steps, despite being trained on only a few steps.
翻译:以美元(x,z) = p(z) p(x,z) p(x) p(z) p(x) p(x) p(z) p(x) p(x) p(x) p(x) p(z) 具有快速和精确抽样的优势。 然而,这些模型的弱点在于需要指定美元(z) 美元(z), 通常是一个简单固定的模型, 从而限制模型的清晰度。 未经引导的潜伏模型放弃美元( z) 的要求, 将美元( z) 指定为美元( p(x,z) =美元( p) =p(x) p(x) p(x, z) p(x) p(m) p(m) p(x) p(x) p(x, p(x) p(m) p(x) p(x) p(x) p(m) p(m) p(z) =(x(x) 美元(x) 美元(s(s) leveloplevelop levelill levelill level(s) level(s) levelill) level( level) level(s) ) ) level(我们只能(要显示(或(x) leak) level) ) ) legal) legal) legal) 。