Conditional inference on arbitrary subsets of variables is a core problem in probabilistic inference with important applications such as masked language modeling and image inpainting. In recent years, the family of Any-Order Autoregressive Models (AO-ARMs) -- closely related to popular models such as BERT and XLNet -- has shown breakthrough performance in arbitrary conditional tasks across a sweeping range of domains. But, in spite of their success, in this paper we identify significant improvements to be made to previous formulations of AO-ARMs. First, we show that AO-ARMs suffer from redundancy in their probabilistic model, i.e., they define the same distribution in multiple different ways. We alleviate this redundancy by training on a smaller set of univariate conditionals that still maintains support for efficient arbitrary conditional inference. Second, we upweight the training loss for univariate conditionals that are evaluated more frequently during inference. Our method leads to improved performance with no compromises on tractability, giving state-of-the-art likelihoods in arbitrary conditional modeling on text (Text8), image (CIFAR10, ImageNet32), and continuous tabular data domains.
翻译:对任意的变量子子集的有条件推断是概率推论中一个核心问题,它涉及隐蔽语言模型和图像涂色等重要应用的概率推论。近些年来,与BERT和XLNet等流行模型密切相关的任意自动递减模型(AO-ARMs)家族(AO-ARMs)在一系列广泛的任意有条件任务中表现出了突破性的表现。但尽管取得了成功,我们在本文件中确定了对AO-ARMs先前的配方应作出的重大改进。首先,我们表明AO-ARMs在概率模型中存在冗余,即它们以多种不同的方式界定了同样的分布。我们通过在一系列较小的单向性条件型模型上进行培训来减轻这种冗余,这些条件仍然维持着对有效的任意有条件推断的支持。第二,我们加量了在推论期间更经常评估的未承认性条件的训练损失。我们的方法使得性改进了性,在可调性方面没有妥协,在任意的限定性模型10 和连续的图像域域中提供了状态的可能性(T8) 。