Tsetlin machine (TM) is a logic-based machine learning approach with the crucial advantages of being transparent and hardware-friendly. While TMs match or surpass deep learning accuracy for an increasing number of applications, large clause pools tend to produce clauses with many literals (long clauses). As such, they become less interpretable. Further, longer clauses increase the switching activity of the clause logic in hardware, consuming more power. This paper introduces a novel variant of TM learning - Clause Size Constrained TMs (CSC-TMs) - where one can set a soft constraint on the clause size. As soon as a clause includes more literals than the constraint allows, it starts expelling literals. Accordingly, oversized clauses only appear transiently. To evaluate CSC-TM, we conduct classification, clustering, and regression experiments on tabular data, natural language text, images, and board games. Our results show that CSC-TM maintains accuracy with up to 80 times fewer literals. Indeed, the accuracy increases with shorter clauses for TREC, IMDb, and BBC Sports. After the accuracy peaks, it drops gracefully as the clause size approaches a single literal. We finally analyze CSC-TM power consumption and derive new convergence properties.
翻译:Tsetlin 机器(TM) 是一种基于逻辑的机器学习方法,其关键优点是透明和硬件友好。TMM在越来越多的应用中匹配或超过深层次学习精度,但大型条款集合往往产生许多字(长条款)的条款。因此,它们变得不易解释。更长的条款增加了硬件中条款逻辑的转换活动,消耗了更多电力。本文引入了TM学习的新变种 - 条款规模约束TMS(CSC-TM) - 可以对条款大小设置软约束。当一项条款包含比限制允许的更多字数时,它就开始驱逐字数。因此,超大的条款只是暂时出现。为了评价CSC-TM,我们分类、组合和回归实验在表格数据、自然语言文本、图像和游戏上进行。我们的结果表明,CSC-TM的准确性比字数要小80倍。事实上,由于对TREC、IMDb和BC体育的较短条款增加了准确性。在精确性峰值之后,我们逐渐地分析了CSC的趋同性。