Bagging and boosting are two popular ensemble methods in machine learning (ML) that produce many individual decision trees. Due to the inherent ensemble characteristic of these methods, they typically outperform single decision trees or other ML models in predictive performance. However, numerous decision paths are generated for each decision tree, increasing the overall complexity of the model and hindering its use in domains that require trustworthy and explainable decisions, such as finance, social care, and health care. Thus, the interpretability of bagging and boosting algorithms, such as random forests and adaptive boosting, reduces as the number of decisions rises. In this paper, we propose a visual analytics tool that aims to assist users in extracting decisions from such ML models via a thorough visual inspection workflow that includes selecting a set of robust and diverse models (originating from different ensemble learning algorithms), choosing important features according to their global contribution, and deciding which decisions are essential for global explanation (or locally, for specific cases). The outcome is a final decision based on the class agreement of several models and the explored manual decisions exported by users. Finally, we evaluate the applicability and effectiveness of VisRuler via a use case, a usage scenario, and a user study.
翻译:拖动和推动是产生许多个别决策树的机器学习(ML)中两种流行的混合方法。由于这些方法的内在共同特点,通常优于单一决策树或其他ML模型的预测性能,然而,为每个决策树创造了许多决策路径,增加了模型的总体复杂性,阻碍了模型在金融、社会护理和保健等需要可信赖和可解释决策的领域的使用。因此,套装和推动算法(如随机森林和适应性增强等)的可解释性随着决策数量的增加而减少。在本文件中,我们提出了一个视觉分析工具,目的是通过彻底的视觉检查工作流程协助用户从这种ML模型中提取决定,其中包括选择一套稳健而多样的模式(根据不同的混合学习算法产生),根据它们的全球贡献选择重要特征,并决定哪些决定对于全球解释(或具体案例的当地)至关重要。我们根据若干模型的类协议和用户输出的人工决定,提出一个最终决定。我们通过一个用户使用的方法,评估一个应用性和有效性。