在变换器中利用感性偏见,使语法和语义与 VAE 的无监督分解 (Exploiting Inductive Bias in Transformers for Unsupervised Disentanglement of Syntax and Semantics with VAEs)

We propose a generative model for text generation, which exhibits disentangled latent representations of syntax and semantics. Contrary to previous work, this model does not need syntactic information such as constituency parses, or semantic information such as paraphrase pairs. Our model relies solely on the inductive bias found in attention-based architectures such as Transformers. In the attention of Transformers, keys handle information selection while values specify what information is conveyed. Our model, dubbed QKVAE, uses Attention in its decoder to read latent variables where one latent variable infers keys while another infers values. We run experiments on latent representations and experiments on syntax/semantics transfer which show that QKVAE displays clear signs of disentangled syntax and semantics. We also show that our model displays competitive syntax transfer capabilities when compared to supervised models and that comparable supervised models need a fairly large amount of data (more than 50K samples) to outperform it on both syntactic and semantic transfer. The code for our experiments is publicly available.

翻译：我们为文本生成建议了一个基因模型, 它显示了语法和语义学的分解潜在代表。与先前的工作相反, 这个模型不需要合成信息, 如选区剖面, 或语义配对等语义信息。我们的模型完全依赖于在以关注为基础的结构( 如变换器)中发现的诱导偏差。在变换器的注意下, 键会处理信息选择, 而值会指定传递的信息。我们的模型, 被称为 QKVAE, 在其解码器中, 使用“ 注意” 来阅读潜在变量, 在其中, 一个潜在变量推断关键值, 而另一个推断值。我们进行关于语法/ 语义转换的实验和实验, 显示 QKVAE 显示分解语法和语义转换的清晰迹象。我们还显示, 我们的模型显示, 当与受监督的模式相比, 具有竞争性的语法转移能力, 并且可比的受监督模型需要相当大量的数据( 超过 50K 样), 以在合成和语义传输上超越它。我们的代码是公开的。