In the last few years, the release of BERT, a multilingual transformer based model, has taken the NLP community by storm. BERT-based models have achieved state-of-the-art results on various NLP tasks, including dialog tasks. One of the limitation of BERT is the lack of ability to handle long text sequence. By default, BERT has a maximum wordpiece token sequence length of 512. Recently, there has been renewed interest to tackle the BERT limitation to handle long text sequences with the addition of new self-attention based architectures. However, there has been little to no research on the impact of this limitation with respect to dialog tasks. Dialog tasks are inherently different from other NLP tasks due to: a) the presence of multiple utterances from multiple speakers, which may be interlinked to each other across different turns and b) longer length of dialogs. In this work, we empirically evaluate the impact of dialog length on the performance of BERT model for the Next Response Selection dialog task on four publicly available and one internal multi-turn dialog datasets. We observe that there is little impact on performance with long dialogs and even the simplest approach of truncating input works really well.
翻译:在过去几年里,基于多语种变压器的模式BERT的发布以风暴方式取代了NLP社区。基于BERT的模型在包括对话任务在内的各种NLP任务上取得了最新的结果。BERT的局限性之一是缺乏处理长文本序列的能力。默认情况下,BERT的最大单字符号序列长度为512。最近,人们再次有兴趣处理BERT的长文本序列限制,加上新的基于自我注意的架构。然而,对于这一限制对对话任务的影响,几乎没有研究。由于以下原因,对话框的任务与其他NLP任务之间有着内在的不同之处:a)来自多个发言者的多语句,它们可能在不同转弯间相互连接,b)对话时间较长。在这项工作中,我们从经验上评估了BERT模式对下一反应选择对话框对四个公开的性能的影响,以及一个内部多方向对话数据集。我们发现,对四个公开的对话框的性能影响很小。我们观察到,对于长期对话甚至简单的输入工作方式的实际影响很小。