To increase the interpretability and prediction accuracy of the Machine Learning (ML) models, visualization of ML models is a key part of the ML process. Decision Trees (DTs) are essential in machine learning (ML) because they are used to understand many black box ML models including Deep Learning models. In this research, two new methods for creation and enhancement with complete visualizing Decision Trees as understandable models are suggested. These methods use two versions of General Line Coordinates (GLC): Bended Coordinates (BC) and Shifted Paired Coordinates (SPC). The Bended Coordinates are a set of line coordinates, where each coordinate is bended in a threshold point of the respective DT node. In SPC, each n-D point is visualized in a set of shifted pairs of 2-D Cartesian coordinates as a directed graph. These new methods expand and complement the capabilities of existing methods to visualize DT models more completely. These capabilities allow us to observe and analyze: (1) relations between attributes, (2) individual cases relative to the DT structure, (3) data flow in the DT, (4) sensitivity of each split threshold in the DT nodes, and (5) density of cases in parts of the n-D space. These features are critical for DT models' performance evaluation and improvement by domain experts and end users as they help to prevent overgeneralization and overfitting of the models. The advantages of this methodology are illustrated in the case studies on benchmark real-world datasets. The paper also demonstrates how to generalize them for decision tree visualizations in different General Line Coordinates.
翻译:暂无翻译