Visualization of Machine Learning (ML) models is an important part of the ML process to enhance the interpretability and prediction accuracy of the ML models. This paper proposes a new method SPC-DT to visualize the Decision Tree (DT) as interpretable models. These methods use a version of General Line Coordinates called Shifted Paired Coordinates (SPC). In SPC, each n-D point is visualized in a set of shifted pairs of 2-D Cartesian coordinates as a directed graph. The new method expands and complements the capabilities of existing methods, to visualize DT models. It shows: (1) relations between attributes, (2) individual cases relative to the DT structure, (3) data flow in the DT, (4) how tight each split is to thresholds in the DT nodes, and (5) the density of cases in parts of the n-D space. This information is important for domain experts for evaluating and improving the DT models, including avoiding overgeneralization and overfitting of models, along with their performance. The benefits of the methods are demonstrated in the case studies, using three real datasets.
翻译:机器学习模型的可视化(ML)模型是ML进程的一个重要部分,目的是提高ML模型的可解释性和预测准确性,本文件提出一种新的方法SPC-DT,将决定树(DT)作为可解释模型,这些方法使用通用线坐标的版本,称为变换式配对坐标(SPC)。在SPC中,每个正-D点都以一套2-D卡泰斯坐标的移动对配方作为定向图形进行可视化。新方法扩展并补充了现有方法的能力,使DT模型具有可视化性。它显示:(1) 属性之间的关系,(2) 与DT结构有关的个别案例,(3) DT中的数据流,(4) 每一个拆分到D节点的阈值有多紧,(5) n-D空间部分地区的案例密度。这一信息对域专家评价和改进DT模型十分重要,包括避免过于概括和超配制模型及其性能。在案例研究中展示了方法的效益,使用三种真实数据集。