Jokes are intentionally written to be funny, but not all jokes are created the same. Some jokes may be fit for a classroom of kindergarteners, but others are best reserved for a more mature audience. While recent work has shown impressive results on humor detection in text, here we instead investigate the more nuanced task of detecting humor subtypes, especially of the less innocent variety. To that end, we introduce a novel jokes dataset filtered from Reddit and solve the subtype classification task using a finetuned Transformer dubbed the Naughtyformer. Moreover, we show that our model is significantly better at detecting offensiveness in jokes compared to state-of-the-art methods.
翻译:笑话是故意写成的,但并非所有笑话都是这样。有些笑话可能适合幼儿园教师的课堂,但有些笑话最好留给更成熟的观众。虽然最近的工作在文字中幽默检测方面已经取得了令人印象深刻的结果,但我们在这里调查了更微妙的发现幽默亚型的任务,特别是较不纯洁的种类。为此,我们引入了一个从Redddit过滤出来的新的笑话数据集,并使用一个假称Nugustyter的微调变异器解决了子类型分类任务。此外,我们展示了我们的模型在发现笑话中的冒犯性比最先进的方法要好得多。