In this paper, we present a method of building strong, explainable classifiers in the form of Boolean search rules. We developed an interactive environment called CASE (Computer Assisted Semantic Exploration) which exploits word co-occurrence to guide human annotators in selection of relevant search terms. The system seamlessly facilitates iterative evaluation and improvement of the classification rules. The process enables the human annotators to leverage the benefits of statistical information while incorporating their expert intuition into the creation of such rules. We evaluate classifiers created with our CASE system on 4 datasets, and compare the results to machine learning methods, including SKOPE rules, Random forest, Support Vector Machine, and fastText classifiers. The results drive the discussion on trade-offs between superior compactness, simplicity, and intuitiveness of the Boolean search rules versus the better performance of state-of-the-art machine learning models for text classification.
翻译:在本文中,我们介绍了一种以布尔搜索规则的形式建立强大、可解释的分类方法。我们开发了一种互动环境,称为CASE(计算机辅助语义探索),它利用“共字”来指导相关搜索术语的选择;这个系统无缝地促进了对分类规则的迭代评价和改进。这个过程使人类标识者能够利用统计资料的好处,同时将其专家直觉纳入此类规则的创建中。我们评估了与我们的CASE系统在4个数据集上创建的分类者,并将结果与机器学习方法进行了比较,包括SKOPE规则、随机森林、支持矢量机和快式分类方法。结果推动了关于“布林搜索规则”的超紧凑性、简单性和直观性与文本分类方面最先进的机器学习模型的更好性能之间的取舍的讨论。