Recent research in novelty detection focuses mainly on document-level classification, employing deep neural networks (DNN). However, the black-box nature of DNNs makes it difficult to extract an exact explanation of why a document is considered novel. In addition, dealing with novelty at the word-level is crucial to provide a more fine-grained analysis than what is available at the document level. In this work, we propose a Tsetlin machine (TM)-based architecture for scoring individual words according to their contribution to novelty. Our approach encodes a description of the novel documents using the linguistic patterns captured by TM clauses. We then adopt this description to measure how much a word contributes to making documents novel. Our experimental results demonstrate how our approach breaks down novelty into interpretable phrases, successfully measuring novelty.
翻译:最近的新发现研究主要侧重于文件级分类,使用深层神经网络(DNN)。然而,DNN的黑箱性质使得很难准确解释为什么认为文件是新奇的。此外,在字级处理新颖性对于提供比文件级更精细的分析至关重要。在这项工作中,我们提议基于Tsetlin机器(TM)的架构,根据其对新颖性的贡献来评分单词。我们的方法用TM条款所捕捉的语言模式对新书文件的描述进行编码。我们随后采用这一描述来衡量一个单词对文件的创作贡献。我们的实验结果表明我们的方法是如何将新写成可解释的词,成功地测量新书。