Text autoencoders are often used for unsupervised conditional text generation by applying mappings in the latent space to change attributes to the desired values. Recently, Mai et al. (2020) proposed Emb2Emb, a method to learn these mappings in the embedding space of an autoencoder. However, their method is restricted to autoencoders with a single-vector embedding, which limits how much information can be retained. We address this issue by extending their method to Bag-of-Vectors Autoencoders (BoV-AEs), which encode the text into a variable-size bag of vectors that grows with the size of the text, as in attention-based models. This allows to encode and reconstruct much longer texts than standard autoencoders. Analogous to conventional autoencoders, we propose regularization techniques that facilitate learning meaningful operations in the latent space. Finally, we adapt Emb2Emb for a training scheme that learns to map an input bag to an output bag, including a novel loss function and neural architecture. Our empirical evaluations on unsupervised sentiment transfer show that our method performs substantially better than a standard autoencoder.
翻译:文本自动编码器通常用于无监督的有条件文本生成,方法是在暗层中进行绘图,以改变期望值的属性。 最近, Mai 等人(202020年)提出了Emb2Emb,这是在自动编码器嵌入空间中学习这些映射的方法。然而,它们的方法仅限于单向嵌入器的自动编码器,这限制了可以保留多少信息。我们通过将其方法扩展至导体自动编码器(BoV-AEs),将文字编码成随文本大小而增长的可变尺寸矢量袋,如同关注模型一样。这样可以对比标准的自动编码器更长的文本进行编码和重新构建。对常规自动编码器进行分析,我们提出规范化技术,以便利学习在潜层空间中有意义的操作。最后,我们将 Emb2Emb用于一个培训方案,该培训方案可以学习将输入袋映射成输出袋,包括新式损失函数和内线结构。我们对非超导式感应感官感官感官感官感官的感官感官感应比我们的方法要好得多。