Context: The identification of bugs within the reported issues in an issue tracker is crucial for the triage of issues. Machine learning models have shown promising results regarding the performance of automated issue type prediction. However, we have only limited knowledge beyond our assumptions how such models identify bugs. LIME and SHAP are popular technique to explain the predictions of classifiers. Objective: We want to understand if machine learning models provide explanations for the classification that are reasonable to us as humans and align with our assumptions of what the models should learn. We also want to know if the prediction quality is correlated with the quality of explanations. Method: We conduct a study where we rate LIME and SHAP explanations based on their quality of explaining the outcome of an issue type prediction model. For this, we rate the quality of the explanations themselves, i.e., if they align with our expectations and if they help us to understand the underlying machine learning model.
翻译:背景:问题跟踪器所报告问题中的错误识别对于问题分类至关重要。机器学习模型在自动问题类型预测的性能方面显示了有希望的结果。然而,我们除了这些模型如何识别错误的假设外,仅掌握了有限的知识。LIME和SHAP是解释分类者预测的常用技术。目标:我们想了解机器学习模型是否为我们作为人的合理分类提供了解释,并符合我们对模型应学习内容的假设。我们还想知道预测质量是否与解释质量相关。方法:我们进行一项研究,根据解释问题类型预测模型结果的质量,对LIME和SHAP的解释进行评分。为此,我们评定解释的质量,即,如果这些解释符合我们的期望,以及如果它们有助于我们理解基本的机器学习模型。