Ad hoc abbreviations are commonly found in informal communication channels that favor shorter messages. We consider the task of reversing these abbreviations in context to recover normalized, expanded versions of abbreviated messages. The problem is related to, but distinct from, spelling correction, in that ad hoc abbreviations are intentional and may involve substantial differences from the original words. Ad hoc abbreviations are productively generated on-the-fly, so they cannot be resolved solely by dictionary lookup. We generate a large, open-source data set of ad hoc abbreviations. This data is used to study abbreviation strategies and to develop two strong baselines for abbreviation expansion
翻译:特别缩略语通常存在于有利于缩短信息长度的非正式通信渠道中。我们考虑的是,在恢复简略信息正常、扩大版本的背景下,反转这些缩略语的任务。问题与拼写更正有关,但与拼写更正不同,因为临时缩略语是故意的,可能与原文有很大差异。临时缩略语是现时有效产生的,因此无法仅通过字典搜索加以解决。我们生成了一套庞大的开放源数据集。这些数据用于研究缩略略语战略,并为缩略语扩展制定两个强有力的基准。