Poetry generation has been a difficult task in natural language processing. Unlike plain neural text generation tasks, poetry has a high requirement for novelty, since an easily-understood sentence with too many high frequency words might not be considered as poetic, while adequately ambiguous sentences with low frequency words can possibly be novel and creative. Inspired by this, we present Lingxi, a diversity-aware Chinese modern poetry generation system. We propose nucleus sampling with randomized head (NS-RH) algorithm, which randomizes the high frequency part ("head") of the predicted distribution, in order to emphasize on the "comparatively low frequency" words. The proposed algorithm can significantly increase the novelty of generated poetry compared with traditional sampling methods. The permutation of distribution is controllable by tuning the filtering parameter that determines the "head" to permutate, achieving diversity-aware sampling. We find that even when a large portion of filtered vocabulary is randomized, it can actually generate fluent poetry but with notably higher novelty. We also propose a semantic-similarity-based rejection sampling algorithm, which creates longer and more informative context on the basis of the short input poetry title while maintaining high semantic similarity to the title, alleviating the off-topic issue.
翻译:与普通神经文字生成任务不同,诗歌对于新颖性的要求很高,因为一个容易理解的句子加上太多高频单词可能不会被视为诗意,而使用低频单词的足够模糊的句子则可能是新颖的和创造性的。受此启发,我们向Lingxi展示了一个多样化的中国现代诗歌生成系统。我们建议用随机头(NS-RH)算法进行核心抽样,该算法随机地将预测分布的高频部分(“头”)("头")进行抽查,以强调“相对低频”的词句。提议的算法可以大大增加所产生的诗歌的新颖性,而与传统的抽样方法相比,这种算法可以大大增加。分配的变异性可以通过调整过滤参数加以控制,该参数决定着“头”进行交接,实现多样性觉的采样。我们发现,即使大部分过滤的词汇是随机的,它实际上也能产生流畅的诗歌,但特别高的新奇特。我们还提议用一个基于语类相似的否定式取样算法的词,与传统的抽样算法可以大大地增加,在高位上维持高位。