研究名称实体在文本风格传输中保存内容方面的作用 (Studying the role of named entities for content preservation in text style transfer)

Text style transfer techniques are gaining popularity in Natural Language Processing, finding various applications such as text detoxification, sentiment, or formality transfer. However, the majority of the existing approaches were tested on such domains as online communications on public platforms, music, or entertainment yet none of them were applied to the domains which are typical for task-oriented production systems, such as personal plans arrangements (e.g. booking of flights or reserving a table in a restaurant). We fill this gap by studying formality transfer in this domain. We noted that the texts in this domain are full of named entities, which are very important for keeping the original sense of the text. Indeed, if for example, someone communicates the destination city of a flight it must not be altered. Thus, we concentrate on the role of named entities in content preservation for formality text style transfer. We collect a new dataset for the evaluation of content similarity measures in text style transfer. It is taken from a corpus of task-oriented dialogues and contains many important entities related to realistic requests that make this dataset particularly useful for testing style transfer models before using them in production. Besides, we perform an error analysis of a pre-trained formality transfer model and introduce a simple technique to use information about named entities to enhance the performance of baseline content similarity measures used in text style transfer.

翻译：在自然语言处理中,文本样式传输技术越来越受欢迎,发现各种应用,如文本解毒、情绪或形式转让等,但大多数现有方法在公共平台、音乐或娱乐的在线通信等领域都进行了测试,然而,这些方法都没有应用于以任务为导向的生产系统典型的领域,如个人计划安排(例如,飞行预订或保留餐桌),我们通过研究该领域的手续传输来填补这一空白。我们注意到,该领域的文本充满了被点名的实体,这对于保持文字的原始感意义非常重要。例如,如果有人在公共平台、音乐或娱乐的在线通信中告知飞行的目的地城市,则不得改变。因此,我们集中关注指定实体在保存内容以保持格式文本风格传输方面的作用。我们收集了一套新的数据集,用于评价文本样式传输中内容相似的措施。我们从一系列以任务为导向的对话中收集,并载有许多与现实要求相关的重要实体,这些实体使得这一数据集在生产前特别有助于测试风格传输模式。此外,我们还利用一种简化的文本样式,即采用一种简化的简化的文本格式转换方法,我们用了一个简单的格式使用了一种简化的文本格式格式,用了一个简单的格式格式格式格式格式格式格式格式格式格式格式格式的模型。