This paper contributes to interpretable machine learning via visual knowledge discovery in general line coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and general line coordinates are combined to create a visual self-service machine learning model. The DSC1 and DSC2 lossless multidimensional coordinate systems are proposed. DSC1 and DSC2 can map multiple dataset attributes to a single two-dimensional (X, Y) Cartesian plane using a graph construction algorithm. The hyperblock analysis was used to determine visually appealing dataset attribute orders and to reduce line occlusion. It is shown that hyperblocks can generalize decision tree rules and a series of DSC1 or DSC2 plots can visualize a decision tree. The DSC1 and DSC2 plots were tested on benchmark datasets from the UCI ML repository. They allowed for visual classification of data. Additionally, areas of hyperblock impurity were discovered and used to establish dataset splits that highlight the upper estimate of worst-case model accuracy to guide model selection for high-risk decision-making. Major benefits of DSC1 and DSC2 is their highly interpretable nature. They allow domain experts to control or establish new machine learning models through visual pattern discovery.
翻译:本文有助于通过一般线坐标(GLC)的视觉知识发现来解释机器的学习。将超级区块的概念作为可解释的数据集单元和一般线坐标结合起来,以创建视觉自助机学习模型。提出了DSC1和DSC2无损失的多维坐标系统。DSC1和DSC2可以用图形构建算法绘制单二维(X、Y)的多数据集属性图。超区块分析用来确定有视觉吸引力的数据集组属性命令并减少线隔。显示超区块可以概括决策树规则和一系列DSC1或DSC2图谱可以视觉化决策树。DSC1和DSC2图块是在UCI ML 仓库的基准数据集上测试的。它们允许对数据进行视觉分类。此外,还发现了超区块杂酸区域,用于建立数据集分解,突出最坏的模型精度最高估计值,用以指导高风险决策模式的选择。DSC1和DSC2的主要效益是它们通过高可判读性模型或可判读性。它们的主要利益是它们可以通过新的视觉模型学习新的发现性。