Learning from raw data input, thus limiting the need for feature engineering, is a component of many successful applications of machine learning methods in various domains. While many problems naturally translate into a vector representation directly usable in standard classifiers, a number of data sources have the natural form of structured data interchange formats (e.g., security logs in JSON/XML format). Existing methods, such as in Hierarchical Multiple Instance Learning (HMIL), allow learning from such data in their raw form. However, the explanation of the classifiers trained on raw structured data remains largely unexplored. By treating these models as sub-set selections problems, we demonstrate how interpretable explanations, with favourable properties, can be generated using computationally efficient algorithms. We compare to an explanation technique adopted from graph neural networks showing an order of magnitude speed-up and higher-quality explanations.
翻译:从原始数据输入中学习,从而限制对地物工程的需要,是许多成功应用不同领域机器学习方法的方法的一个组成部分。虽然许多问题自然地转化成在标准分类器中可直接使用的矢量代表,但一些数据源具有结构化数据交换格式的自然形式(例如JSON/XML格式的安全日志),现有的方法,如等级多例学习(HMIL)中的方法,使得能够从这类原始数据中学习。然而,对经过原始结构化数据培训的分类师的解释基本上仍未探讨。通过将这些模型作为子集选择问题处理,我们展示了如何利用计算效率高的算法生成解释性解释性解释。我们比较了图表神经网络采用的解释技术,显示数量快速和高质量解释的顺序。