Gender-inclusive language is important for achieving gender equality in languages with gender inflections, such as German. While stirring some controversy, it is increasingly adopted by companies and political institutions. A handful of tools have been developed to help people use gender-inclusive language by identifying instances of the generic masculine and providing suggestions for more inclusive reformulations. In this report, we define the underlying tasks in terms of natural language processing, and present a dataset and measures for benchmarking them. We also present a model that implements these tasks, by combining an inclusive language database with an elaborate sequence of processing steps via standard pre-trained models. Our model achieves a recall of 0.89 and a precision of 0.82 in our benchmark for identifying exclusive language; and one of its top five suggestions is chosen in real-world texts in 44% of cases. We sketch how the area could be further advanced by training end-to-end models and using large language models; and we urge the community to include more gender-inclusive texts in their training data in order to not present an obstacle to the adoption of gender-inclusive language. Through these efforts, we hope to contribute to restoring justice in language and, to a small extent, in reality.
翻译:包容性语言对于以德语等具有性别色彩的语言实现性别平等十分重要。虽然它引起了一些争议,但被公司和政治机构越来越多地采用。已经开发了少数工具,通过查明通用男性实例和提供更具包容性的重新拟订建议,帮助人们使用包容性的语言。我们在本报告中界定了自然语言处理的基本任务,并提出了一套数据并制定了基准衡量标准。我们还提出了一个执行这些任务的模式,将包容性语言数据库与通过标准预先培训模式进行的一系列处理步骤结合起来。我们的模式使我们在确定专有语言的基准中重新回顾0.89,精确地点为0.82;其前五项建议之一在实际文本中选择了44%。我们勾画如何通过培训端到端模式和使用大型语言模型来进一步推动该领域的发展;我们敦促社区在其培训数据中纳入更多的性别包容性文本,以免对采用性别包容性语言造成障碍。我们希望通过这些努力,促进在语言中恢复公正,在现实中达到很小的程度。