Language is the principal tool for human communication, in which humor is one of the most attractive parts. Producing natural language like humans using computers, a.k.a, Natural Language Generation (NLG), has been widely used for dialogue systems, chatbots, machine translation, as well as computer-aid creation e.g., idea generations, scriptwriting. However, the humor aspect of natural language is relatively under-investigated, especially in the age of pre-trained language models. In this work, we aim to preliminarily test whether NLG can generate humor as humans do. We build a new dataset consisting of numerous digitized Chinese Comical Crosstalk scripts (called C$^3$ in short), which is for a popular Chinese performing art called `Xiangsheng' since 1800s. (For convenience for non-Chinese speakers, we called `crosstalk' for `Xiangsheng' in this paper.) We benchmark various generation approaches including training-from-scratch Seq2seq, fine-tuned middle-scale PLMs, and large-scale PLMs (with and without fine-tuning). Moreover, we also conduct a human assessment, showing that 1) large-scale pretraining largely improves crosstalk generation quality; and 2) even the scripts generated from the best PLM is far from what we expect, with only 65% quality of human-created crosstalk. We conclude, humor generation could be largely improved using large-scaled PLMs, but it is still in its infancy. The data and benchmarking code is publicly available in \url{https://github.com/anonNo2/crosstalk-generation}.
翻译:使用计算机、 a.k.a、自然语言生成(NLG) 制作自然语言,如人文等人文的自然语言,已被广泛用于对话系统、聊天机、机器翻译以及计算机辅助创造,例如思想代、书写等。然而,自然语言的幽默方面调查相对不足,特别是在受过培训的语言模式时代,自然语言的幽默方面调查相对不足,特别是在经过事先培训的语言模式时代。在这项工作中,我们的目标是初步测试是否像人类那样,通过计算机、a.k.a.a.a.a.a. 制作新的数据集,包括许多数字化的中国通俗化的中国Comical Cross 脚本(简称C$3美元,简称简称C$3美元,简称“Xiangsheng”)。然而,自然语言的幽默方面相对调查相对不足,特别是在受过训练的语言模式中,我们的目标是初步测试各种代际方法,包括从Scrcrent Sqent Sq2qe、经过微调整的中级和经过深级的中级PLM,以及大规模、没有进行质量评估的MM(我们不精化、大规模地、大规模地、不进行质量评估),我们只能和大规模地、从大规模地、从质量上进行,我们进行,我们进行,我们进行,我们进行,还有、从大规模地、进行、从、从大规模地、从、从、从、、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、、、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、从、、、、、、、、、、、、、和、、、、、和、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、、和、、、、、、、、、、、、、和、、、、、、、、、、、、、、、、、、、、、和、和、、、、、、、和、、、、、、、、、