Linking neural representations to linguistic factors is crucial in order to build and analyze NLP models interpretable by humans. Among these factors, syntactic roles (e.g. subjects, direct objects,$\dots$) and their realizations are essential markers since they can be understood as a decomposition of predicative structures and thus the meaning of sentences. Starting from a deep probabilistic generative model with attention, we measure the interaction between latent variables and realizations of syntactic roles and show that it is possible to obtain, without supervision, representations of sentences where different syntactic roles correspond to clearly identified different latent variables. The probabilistic model we propose is an Attention-Driven Variational Autoencoder (ADVAE). Drawing inspiration from Transformer-based machine translation models, ADVAEs enable the analysis of the interactions between latent variables and input tokens through attention. We also develop an evaluation protocol to measure disentanglement with regard to the realizations of syntactic roles. This protocol is based on attention maxima for the encoder and on latent variable perturbations for the decoder. Our experiments on raw English text from the SNLI dataset show that $\textit{i)}$ disentanglement of syntactic roles can be induced without supervision, $\textit{ii)}$ ADVAE separates syntactic roles better than classical sequence VAEs and Transformer VAEs, $\textit{iii)}$ realizations of syntactic roles can be separately modified in sentences by mere intervention on the associated latent variables. Our work constitutes a first step towards unsupervised controllable content generation. The code for our work is publicly available.
翻译:将神经表达方式与语言因素联系起来对于建立和分析人类可以解释的 NLP 模型至关重要。 在这些因素中,合成作用(例如对象、直接对象、$\dots$)及其实现是基本标记,因为它们可以被理解为预设结构的分解,从而可以理解句子的含义。从深度概率变异模型开始,我们测量潜伏变量与实现合成作用之间的相互作用,并表明,在没有监督的情况下,有可能获得不同正轨作用与明确确定的不同潜伏变量相对应的句子表达。我们提议的概率模型是注意- Driven 生成自动coder(ADVAE)。从基于变换机的转换模型中提取灵感,ADVAEE能够通过注意分析潜伏变量与输入符号之间的相互作用。我们还开发了一个评估协议,以测量与实现同步作用相关的分解关系。这个协议的基础是,在不使用正序的正序值的正序值调的正弦值和对正值变量的演算中,可以显示我们的正值变变变变的正文。