Applications of multilevel models usually result in binary classification within groups or hierarchies based on a set of input features. For transparent and ethical applications of such models, sound audit frameworks need to be developed. In this paper, an audit framework for technical assessment of regression MLMs is proposed. The focus is on three aspects, model, discrimination, and transparency and explainability. These aspects are subsequently divided into sub aspects. Contributors, such as inter MLM group fairness, feature contribution order, and aggregated feature contribution, are identified for each of these sub aspects. To measure the performance of the contributors, the framework proposes a shortlist of KPIs. A traffic light risk assessment method is furthermore coupled to these KPIs. For assessing transparency and explainability, different explainability methods (SHAP and LIME) are used, which are compared with a model intrinsic method using quantitative methods and machine learning modelling. Using an open source dataset, a model is trained and tested and the KPIs are computed. It is demonstrated that popular explainability methods, such as SHAP and LIME, underperform in accuracy when interpreting these models. They fail to predict the order of feature importance, the magnitudes, and occasionally even the nature of the feature contribution. For other contributors, such as group fairness and their associated KPIs, similar analysis and calculations have been performed with the aim of adding profundity to the proposed audit framework. The framework is expected to assist regulatory bodies in performing conformity assessments of AI systems using multilevel binomial classification models at businesses. It will also benefit businesses deploying MLMs to be future proof and aligned with the European Commission proposed Regulation on Artificial Intelligence.
翻译:多层次模型的应用通常导致基于一组投入特点的分组或等级的二元分类。为了透明和合乎道德地应用这些模型,需要制定健全的审计框架。在本文件中,提出了对回归 MLM 进行技术评估的审计框架(SHAP和LIME),重点是三个方面:模式、歧视、透明度和可解释性。这些方面随后分为几个小方面。为上述每个子方面确定了贡献者,如MLM集团之间的公平性、特征贡献顺序和综合特征贡献。为衡量贡献者的业绩,框架提出了一份KPI的短名单。交通轻风险评估方法与这些KPIs相配合。为了评估透明度和解释性,采用了不同的解释性方法(SHAP和LME),与使用定量方法和机器学习模型的模型内在方法进行比较。使用开放源数据集,对模型进行了培训和测试,并计算了KPIs的拟议分类方法。在解释这些模型时,SHAP和LME的未来精确性评估将低于这些模型的准确性。在解释透明度和解释性解释性评估中,它们也未能预测成本水平的准确性,同时进行成本分析。