CUT: 可控不可监督的文本简化 (CUT: Controllable Unsupervised Text Simplification)

In this paper, we focus on the challenge of learning controllable text simplifications in unsupervised settings. While this problem has been previously discussed for supervised learning algorithms, the literature on the analogies in unsupervised methods is scarse. We propose two unsupervised mechanisms for controlling the output complexity of the generated texts, namely, back translation with control tokens (a learning-based approach) and simplicity-aware beam search (decoding-based approach). We show that by nudging a back-translation algorithm to understand the relative simplicity of a text in comparison to its noisy translation, the algorithm self-supervises itself to produce the output of the desired complexity. This approach achieves competitive performance on well-established benchmarks: SARI score of 46.88% and FKGL of 3.65% on the Newsela dataset.

翻译：在本文中,我们侧重于在不受监督的环境下学习可控文本简化的挑战。虽然这个问题以前曾为受监督的学习算法而讨论过, 有关未经监督的方法中的类比的文献是伤疤。我们建议了控制生成文本产出复杂性的两个不受监督的机制, 即: 带控制符号的背翻译( 以学习为基础的方法) 和简单觉悟的波束搜索( 以解码为基础的方法 ) 。我们通过编译反译算法来理解文本相对于吵闹的翻译的相对简单性, 算法自我监督本身就能产生所希望的复杂性产出。这种方法在既定基准上取得了竞争性业绩: SARI 得分46.88%, Newselela 数据集的FKGL 得分为3.65% 。