Pretrained language models based on the transformer architecture have shown great success in NLP. Textual training data often comes from the web and is thus tagged with time-specific information, but most language models ignore this information. They are trained on the textual data alone, limiting their ability to generalize temporally. In this work, we extend the key component of the transformer architecture, i.e., the self-attention mechanism, and propose temporal attention - a time-aware self-attention mechanism. Temporal attention can be applied to any transformer model and requires the input texts to be accompanied with their relevant time points. It allows the transformer to capture this temporal information and create time-specific contextualized word representations. We leverage these representations for the task of semantic change detection; we apply our proposed mechanism to BERT and experiment on three datasets in different languages (English, German, and Latin) that also vary in time, size, and genre. Our proposed model achieves state-of-the-art results on all the datasets.
 翻译:基于变压器结构的未经培训的语言模型在NLP中表现出了巨大的成功。 文本培训数据往往来自网络,因此有特定时间的信息标记,但大多数语言模型忽略了这些信息。 它们只接受文本数据培训,限制了其时间化能力。 在这项工作中,我们扩展变压器结构的关键组成部分,即自留机制,并提议时间关注----一个有时间意识的自留机制。 时间关注可以应用到任何变压器模型,并要求输入文本随附相关时间点。 它使变压器能够捕捉这种时间信息,并创建具体时间化的文字表达方式。 我们利用这些表达方式来检测语义变化的任务; 我们运用我们提议的机制来测试变压器,并试验三种语言(英语、德语和拉丁语)的数据集,这些语言的时间、大小和类型也各不相同。 我们提议的模型在所有数据集上都取得了最新的结果。