Recent large-scale neural autoregressive sequence models have shown impressive performances on a variety of natural language generation tasks. However, their generated sequences often exhibit degenerate properties such as non-termination, undesirable repetition, and premature termination, when generated with decoding algorithms such as greedy search, beam search, top-$k$ sampling, and nucleus sampling. In this paper, we focus on the problem of non-terminating sequences resulting from an incomplete decoding algorithm. We first define an incomplete probable decoding algorithm which includes greedy search, top-$k$ sampling, and nucleus sampling, beyond the incomplete decoding algorithm originally put forward by Welleck et al. (2020). We then propose a non-monotonic self-terminating language model, which significantly relaxes the constraint of monotonically increasing termination probability in the originally proposed self-terminating language model by Welleck et al. (2020), to address the issue of non-terminating sequences when using incomplete probable decoding algorithms. We prove that our proposed model prevents non-terminating sequences when using not only incomplete probable decoding algorithms but also beam search. We empirically validate our model on sequence completion tasks with various architectures.
翻译:近期大型神经自动递进式序列模型在各种自然语言生成任务中表现出令人印象的惊人表现。然而,这些序列生成的序列往往呈现出一些退化的特性,如无解码算法生成的无终结、不不鼓励的重复和过早终止,这些算法包括贪婪的搜索、梁搜索、波音搜索、最高-美元取样和核取样。在本文件中,我们侧重于一个不完整解码算法导致的非终止序列问题。我们首先确定一个不完整的可能解码算法,其中包括贪婪的搜索、最高-1美元k$的取样和核心取样,这除了韦列克等人最初提出的不完整解码算法(202020年)之外,还往往显示出非分子自定义的自定义语言模型,大大减轻最初提议的由韦列克等人(202020年)自定义语言模型中单质增加终止概率的制约,以解决使用不完整的可能解码算算算算算算算算算算算算法时不完全不确定序列的问题。我们提议的模型不仅用不完全可能的解算,而且还要用不完全可能完成的建筑的解算,而是在完成各种结构时防止非终止的序列。