Tsetlin Machine (TM) is an interpretable pattern recognition algorithm based on propositional logic, which has demonstrated competitive performance in many Natural Language Processing (NLP) tasks, including sentiment analysis, text classification, and Word Sense Disambiguation. To obtain human-level interpretability, legacy TM employs Boolean input features such as bag-of-words (BOW). However, the BOW representation makes it difficult to use any pre-trained information, for instance, word2vec and GloVe word representations. This restriction has constrained the performance of TM compared to deep neural networks (DNNs) in NLP. To reduce the performance gap, in this paper, we propose a novel way of using pre-trained word representations for TM. The approach significantly enhances the performance and interpretability of TM. We achieve this by extracting semantically related words from pre-trained word representations as input features to the TM. Our experiments show that the accuracy of the proposed approach is significantly higher than the previous BOW-based TM, reaching the level of DNN-based models.
翻译:Tsetlin Machine (TM) 是一种基于命题逻辑的可解释模式识别算法,它在许多自然语言处理任务(包括情绪分析、文本分类和Word Sense Dismenduation)中表现出了竞争性的性能,包括情绪分析、文本分类和Word Sense Dismenduation。为了获得人的可解释性,遗留的TM 采用了词包(BOW)等布尔输入特征。然而,BOW的表示方式使得很难使用任何预先训练过的信息,例如Word2vec和GloVe字表达方式。这一限制限制了TM 的性能,而NLP 的深神经网络(DNN) 。为了缩小性能差距,我们在本文件中提出了一种新的方法,即为TM 使用预先训练过的字表达方式。这个方法极大地提高了TM的性能和可解释性。我们通过将预先训练过的字表作为TM的输入特征来做到这一点。我们的实验表明,拟议方法的准确性大大高于先前的以BOW为基础的TM,达到DNN模型的水平。