Much of the existing work on text novelty detection has been studied at the topic level, i.e., identifying whether the topic of a document or a sentence is novel or not. Little work has been done at the fine-grained semantic level (or contextual level). For example, given that we know Elon Musk is the CEO of a technology company, the sentence "Elon Musk acted in the sitcom The Big Bang Theory" is novel and surprising because normally a CEO would not be an actor. Existing topic-based novelty detection methods work poorly on this problem because they do not perform semantic reasoning involving relations between named entities in the text and their background knowledge. This paper proposes an effective model (called PAT-SND) to solve the problem, which can also characterize the novelty. An annotated dataset is also created. Evaluation shows that PAT-SND outperforms 10 baselines by large margins.
翻译:关于文本新颖探测的现有大部分工作已在专题一级进行了研究,即确定文件或句子的专题是否是新颖的;在精细的语义层面(或背景层面)几乎没有做任何工作;例如,鉴于我们知道Elon Musk是一家技术公司的首席执行官,“Elon Musk在实战中表演了大爆炸理论”一句是新奇和令人惊讶的,因为通常首席执行官不是演员。现有的基于专题的新颖探测方法在这个问题上没有很好地发挥作用,因为它们不执行涉及文本中指定实体之间的关系及其背景知识的语义推理。本文提出了解决问题的有效模式(称为PAT-SND),这也可以说明新颖的特点。还创建了一个附加说明的数据集。评估表明,PAT-SND在大范围内超过了10个基线。