Unavailability of parallel corpora for training text style transfer (TST) models is a very challenging yet common scenario. Also, TST models implicitly need to preserve the content while transforming a source sentence into the target style. To tackle these problems, an intermediate representation is often constructed that is devoid of style while still preserving the meaning of the source sentence. In this work, we study the usefulness of Abstract Meaning Representation (AMR) graph as the intermediate style agnostic representation. We posit that semantic notations like AMR are a natural choice for an intermediate representation. Hence, we propose T-STAR: a model comprising of two components, text-to-AMR encoder and a AMR-to-text decoder. We propose several modeling improvements to enhance the style agnosticity of the generated AMR. To the best of our knowledge, T-STAR is the first work that uses AMR as an intermediate representation for TST. With thorough experimental evaluation we show T-STAR significantly outperforms state of the art techniques by achieving on an average 15.2% higher content preservation with negligible loss (3% approx.) in style accuracy. Through detailed human evaluation with 90,000 ratings, we also show that T-STAR has up to 50% lesser hallucinations compared to state of the art TST models.
翻译:培训文本样式传输( TST) 模型缺乏平行的 Corpora, 培训文本样式传输( TST) 模型是一个非常具有挑战性但常见的情景。 另外, TST 模型暗含需要保存内容,同时将源码编码转换成目标样式。 要解决这些问题, 中间代表结构往往构建缺乏风格的中间代表结构, 同时仍然保留源句的含义。 在这项工作中, 我们研究抽象含义代表图作为中间风格不可知性代表的有用性。 我们假设像 AMR 这样的语义标记是中间代表的一种自然选择。 因此, 我们提议 TSTAR : 由文本到 AMR 编码器和 AMR 到 文本解码器两个组成部分组成的模型。 我们提出若干建模改进, 以提高生成的源码表达的风格的风格。 根据我们所知, T- STAR 是第一个使用AMR 作为中间代表的方法。 通过彻底的实验评估, 我们显示 T-STAR 大大超越了艺术技术的状态, 实现平均15.2%的文本保存, 以可忽略的T- 3 % ST- ST 和50 级 的精确度, 显示我们比的TAST- 级 的排序更低级 。