The human ability of deep cognitive skills are crucial for the development of various real-world applications that process diverse and abundant user generated input. While recent progress of deep learning and natural language processing have enabled learning system to reach human performance on some benchmarks requiring shallow semantics, such human ability still remains challenging for even modern contextual embedding models, as pointed out by many recent studies. Existing machine comprehension datasets assume sentence-level input, lack of casual or motivational inferences, or could be answered with question-answer bias. Here, we present a challenging novel task, trope detection on films, in an effort to create a situation and behavior understanding for machines. Tropes are storytelling devices that are frequently used as ingredients in recipes for creative works. Comparing to existing movie tag prediction tasks, tropes are more sophisticated as they can vary widely, from a moral concept to a series of circumstances, and embedded with motivations and cause-and-effects. We introduce a new dataset, Tropes in Movie Synopses (TiMoS), with 5623 movie synopses and 95 different tropes collecting from a Wikipedia-style database, TVTropes. We present a multi-stream comprehension network (MulCom) leveraging multi-level attention of words, sentences, and role relations. Experimental result demonstrates that modern models including BERT contextual embedding, movie tag prediction systems, and relational networks, perform at most 37% of human performance (23.97/64.87) in terms of F1 score. Our MulCom outperforms all modern baselines, by 1.5 to 5.0 F1 score and 1.5 to 3.0 mean of average precision (mAP) score. We also provide a detailed analysis and human evaluation to pave ways for future research.
翻译:深层次认知技能的人类能力对于开发各种现实世界应用程序至关重要,这些应用程序可以处理多样化和丰富的用户生成的投入。虽然最近深层次学习和自然语言处理的进展使学习系统能够达到某些要求浅浅语义的基准的人类性能,但正如最近许多研究所指出的,对于现代背景嵌入模型来说,这种人性能力仍然具有挑战性。现有的机器理解数据集假定了判决级投入,缺乏随意或动机的推断,或者可以以问答偏差来回答。在这里,我们提出了一个具有挑战性的新任务,即对电影进行直径探测,以努力为机器创造一种状况和行为理解。Trope是描述人类性能的装置,经常用来作为创作作品食谱的配方。与现有的电影标记预测任务相比,这些模型更为复杂,从道德概念到一系列情况,并嵌入了动机和因果关系。我们推出了一个新的数据集,即电影同步(Tropes),通过5623电影同步和95种不同的直径谱,从一个维基-格式数据库、TVTVoroproformal 服务器上收集了我们最高级的模型,还展示了多语系关系。