We consider one-shot probabilistic decoders that map a vector-shaped prior to a distribution over sets or graphs. These functions can be integrated into variational autoencoders (VAE), generative adversarial networks (GAN) or normalizing flows, and have important applications in drug discovery. Set and graph generation is most commonly performed by generating points (and sometimes edge weights) i.i.d. from a normal distribution, and processing them along with the prior vector using Transformer layers or graph neural networks. This architecture is designed to generate exchangeable distributions (all permutations of a set are equally likely) but it is hard to train due to the stochasticity of i.i.d. generation. We propose a new definition of equivariance and show that exchangeability is in fact unnecessary in VAEs and GANs. We then introduce Top-n, a deterministic, non-exchangeable set creation mechanism which learns to select the most relevant points from a trainable reference set. Top-n can replace i.i.d. generation in any VAE or GAN -- it is easier to train and better captures complex dependencies in the data. Top-n outperforms i.i.d generation by 15% at SetMNIST reconstruction, generates sets that are 64% closer to the true distribution on a synthetic molecule-like dataset, and is able to generate more diverse molecules when trained on the classical QM9 dataset. With improved foundations in one-shot generation, our algorithm contributes to the design of more effective molecule generation methods.
翻译:我们认为,在分布于各组或图表的分布之前绘制矢量形状的直径概率解码器,可以将这些函数整合为可变自动读数器(VAE)、基因对抗网络(GAN)或正常流中,并具有重要的药物发现应用。设置和图形生成最常用的方法是从正常分布中生成点(有时是边缘重量)i.d。然后我们引入一个确定性、非互换性的创建机制,从可训练的参考集中选择最相关的点。Top-n可以取代任何可交换的分布(数据集的所有变换都同样可能),但是由于i.d. 生成的可变自变自变自变自动的直径解码(VAE 或GAN 生成的直径解调),因此很难进行训练。在生成更精确数据时,在更精确的生成数据中,通过更精确的生成数据序列中,最容易生成自变的直径生成数据。