In this paper we present our contribution to the TSAR-2022 Shared Task on Lexical Simplification of the EMNLP 2022 Workshop on Text Simplification, Accessibility, and Readability. Our approach builds on and extends the unsupervised lexical simplification system with pretrained encoders (LSBert) system in the following ways: For the subtask of simplification candidate selection, it utilizes a RoBERTa transformer language model and expands the size of the generated candidate list. For subsequent substitution ranking, it introduces a new feature weighting scheme and adopts a candidate filtering method based on textual entailment to maximize semantic similarity between the target word and its simplification. Our best-performing system improves LSBert by 5.9% accuracy and achieves second place out of 33 ranked solutions.
翻译:在本文中,我们介绍了我们对2022年关于简化文本、无障碍和可读性的欧洲MNLP 2022年文本简化、无障碍和可读性讲习班的TRAR-2022共同任务的贡献,我们的方法以下列方式为基础,并扩展了未经监督的简化词汇系统,包括预先培训的编码器(LSBert)系统:对于简化候选人甄选的子任务,它使用ROBERTA变压器语言模式,扩大所产生候选人名单的规模。对于随后的替代等级,它引入了新的特征加权办法,并采用了基于文字要求的筛选方法,以尽量扩大目标字词与简化词的语义相似性。我们最优秀的系统将LSBERt改进了5.9%的精度,并在33个排名解决方案中排第二位。