无先验知识的层次多标签分类中的错误检测与约束恢复 (Error Detection and Constraint Recovery in Hierarchical Multi-Label Classification without Prior Knowledge)

from arxiv, Accepted to CIKM 2024. Code available at https://github.com/lab-v2/PyEDCR . Datasets available at https://huggingface.co/datasets/leibnitz-lab/military_vehicles and https://huggingface.co/datasets/leibnitz-lab/ImageNet50

Recent advances in Hierarchical Multi-label Classification (HMC), particularly neurosymbolic-based approaches, have demonstrated improved consistency and accuracy by enforcing constraints on a neural model during training. However, such work assumes the existence of such constraints a-priori. In this paper, we relax this strong assumption and present an approach based on Error Detection Rules (EDR) that allow for learning explainable rules about the failure modes of machine learning models. We show that these rules are not only effective in detecting when a machine learning classifier has made an error but also can be leveraged as constraints for HMC, thereby allowing the recovery of explainable constraints even if they are not provided. We show that our approach is effective in detecting machine learning errors and recovering constraints, is noise tolerant, and can function as a source of knowledge for neurosymbolic models on multiple datasets, including a newly introduced military vehicle recognition dataset.

翻译：近年来，层次多标签分类（HMC）领域，特别是基于神经符号方法的研究取得了显著进展，通过在训练过程中对神经模型施加约束，有效提升了一致性与准确性。然而，此类工作通常假设这些约束是预先存在的。本文放松了这一强假设，提出了一种基于错误检测规则（EDR）的方法，该方法能够学习关于机器学习模型失效模式的可解释规则。我们证明，这些规则不仅能有效检测机器学习分类器何时发生错误，还可作为HMC的约束条件加以利用，从而即使在未提供约束的情况下也能恢复出可解释的约束。实验表明，我们的方法在检测机器学习错误和恢复约束方面具有显著效果，具备噪声鲁棒性，并能在多个数据集（包括新引入的军用车辆识别数据集）上作为神经符号模型的知识来源。