Recent advances in Explainable AI (XAI) increased the demand for deployment of safe and interpretable AI models in various industry sectors. Despite the latest success of deep neural networks in a variety of domains, understanding the decision-making process of such complex models still remains a challenging task for domain experts. Especially in the financial domain, merely pointing to an anomaly composed of often hundreds of mixed type columns, has limited value for experts. Hence, in this paper, we propose a framework for explaining anomalies using denoising autoencoders designed for mixed type tabular data. We specifically focus our technique on anomalies that are erroneous observations. This is achieved by localizing individual sample columns (cells) with potential errors and assigning corresponding confidence scores. In addition, the model provides the expected cell value estimates to fix the errors. We evaluate our approach based on three standard public tabular datasets (Credit Default, Adult, IEEE Fraud) and one proprietary dataset (Holdings). We find that denoising autoencoders applied to this task already outperform other approaches in the cell error detection rates as well as in the expected value rates. Additionally, we analyze how a specialized loss designed for cell error detection can further improve these metrics. Our framework is designed for a domain expert to understand abnormal characteristics of an anomaly, as well as to improve in-house data quality management processes.
翻译:最近在可解释的AI(XAI)方面的进展增加了在不同工业部门部署安全和可解释的AI模型的需求。尽管深神经网络在不同领域最近取得了成功,但了解这些复杂模型的决策过程仍然是对域专家的一项艰巨任务。特别是在金融领域,仅仅指向经常由数百种混合型柱组成的异常点,对专家的价值有限。因此,我们在本文件中提议了一个框架,用于解释异常现象,使用为混合型表格数据设计的解密自动编码器解释异常点。我们特别将我们的技术集中于观测错误的异常点。这是通过将具有潜在错误的单个样本列(细胞)本地化和分配相应的信任分数来实现的。此外,该模型为纠正错误提供了预期的细胞价值估计。我们根据三个标准的公开表格数据集(默认、成人、IEEE欺诈)和一个专有的数据集来评估我们的方法。我们发现,在这项任务中应用的解密自动编码器已经超越了细胞错误检测率和预期值率中的其他方法。此外,我们分析了我们设计用来改进异常性质量特性的框架,以改进我们设计用于异常性测算的系统。