The tasks of humor understanding and generation are challenging and subjective even for humans, requiring commonsense and real-world knowledge to master. Puns, in particular, add the challenge of fusing that knowledge with the ability to interpret lexical-semantic ambiguity. In this paper, we present the ExPUNations (ExPUN) dataset, in which we augment an existing dataset of puns with detailed crowdsourced annotations of keywords denoting the most distinctive words that make the text funny, pun explanations describing why the text is funny, and fine-grained funniness ratings. This is the first humor dataset with such extensive and fine-grained annotations specifically for puns. Based on these annotations, we propose two tasks: explanation generation to aid with pun classification and keyword-conditioned pun generation, to challenge the current state-of-the-art natural language understanding and generation models' ability to understand and generate humor. We showcase that the annotated keywords we collect are helpful for generating better novel humorous texts in human evaluation, and that our natural language explanations can be leveraged to improve both the accuracy and robustness of humor classifiers.
翻译:幽默理解和生成的任务甚至对人类来说也是具有挑战性和主观性的,甚至需要常识和真实世界的知识才能掌握。 特别是, Puns 增加了将知识与解释词汇- 语义模糊性的能力相结合的挑战。 在本文中,我们介绍了ExPUN 数据集,我们用详细的众源语关键词的注释来增加现有的双人数据集,指出最独特的字眼,用最独特的字眼来说明文本为什么有趣,口语解释为什么有趣,以及细微的幽默感评分。这是第一个幽默数据集,其内容如此广泛和精细的注释是专为双人编写的。基于这些说明,我们提出了两项任务: 解释如何以标语分类和按关键词生成的数据来帮助当前最先进的自然语言理解和生成模型,以挑战当前最先进的自然语言理解和创造幽默感的能力。 我们展示我们收集的附加注释的词句有助于在人类评估中产生更好的新幽默感文本,我们的自然语言解释可以用来改进精准性和精准性。