The neural network based text generation suffers from the text degeneration issue such as repetition. Although top-k sampling and nucleus sampling outperform beam search based decoding methods, they only focus on truncating the "tail" of the distribution and do not address the "head" part, which we show might contain tedious or even repetitive candidates with high probability that lead to repetition loops. They also do not fully address the issue that human text does not always favor high probability words. To explore improved diversity for text generation, we propose a heuristic sampling method inspired by inverse probability weighting. We propose to use interquartile range of the predicted distribution to determine the "head" part, then permutate and rescale the "head" with inverse probability. This aims at decreasing the probability for the tedious and possibly repetitive candidates with higher probability, and increasing the probability for the rational but more surprising candidates with lower probability. The proposed algorithm provides a controllable variation on the predicted distribution which enhances diversity without compromising rationality of the distribution. We use pre-trained language model to compare our algorithm with nucleus sampling. Results show that our algorithm can effectively increase the diversity of generated samples while achieving close resemblance to human text.
翻译:以神经网络为基础的文本生成存在像重复这样的文本变换问题。 尽管顶点取样和核取样超出光谱搜索法基于解码方法, 它们只侧重于短跑分布的“ 尾部”, 不处理“ 头” 部分, 我们显示它可能包含乏味或甚至重复性的候选人, 极有可能导致重复循环。 它们也没有充分解决人类文本并不总是偏爱高概率单词的问题。 为了探索文本生成的更多样化, 我们提议了一种由反概率加权所启发的超常抽样方法。 我们提议使用预测分布的夸大范围来决定“ 头” 部分, 然后调整和重新标定“ 头” 的“ 反概率 ” 部分。 这样做的目的是降低重复性或可能增加重复性候选的可能性, 并且提高理性但更令人惊讶的候选人的可能性。 提议的算法在预测的分布上提供了一种可控制的变化, 从而在不损及分布合理性的情况下加强多样性。 我们使用预先训练的语言模型来比较我们所预测的“ ” 与核心取样的算法可以有效地增加人类的样本。