The paper presents a first attempt towards unsupervised neural text simplification that relies only on unlabeled text corpora. The core framework is comprised of a shared encoder and a pair of attentional-decoders that gains knowledge of both text simplification and complexification through discriminator-based-losses, back-translation and denoising. The framework is trained using unlabeled text collected from en-Wikipedia dump. Our analysis (both quantitative and qualitative involving human evaluators) on a public test data shows the efficacy of our model to perform simplification at both lexical and syntactic levels, competitive to existing supervised methods. We open source our implementation for academic use.
翻译:本文首次试图实现不受监督的神经文本简化,仅依赖未标文本公司,核心框架包括一个共用编码器和一组关注解码器,通过基于歧视的亏损、反译和去音,获得对文本简化和复杂化的了解。框架使用从 en-Wikipedia 垃圾堆收集的未标文本进行培训。我们对公共测试数据的分析(涉及人类评价员的定量和定性分析)表明,在词汇和合成层面进行简化的模型对于现有监管方法具有竞争力。我们为学术用途打开了实施源头。