数据异常现象的预测解释 (On Predictive Explanation of Data Anomalies) - 专知论文

会员服务 ·

0

Performer · 估计/估计量 · 决策平面 · 近似 · MoDELS ·

2021 年 10 月 18 日

On Predictive Explanation of Data Anomalies

翻译：数据异常现象的预测解释

Nikolaos Myrtakis,Ioannis Tsamardinos,Vassilis Christophides

from arxiv, 12 pages

Numerous algorithms have been proposed for detecting anomalies (outliers, novelties) in an unsupervised manner. Unfortunately, it is not trivial, in general, to understand why a given sample (record) is labelled as an anomaly and thus diagnose its root causes. We propose the following reduced-dimensionality, surrogate model approach to explain detector decisions: approximate the detection model with another one that employs only a small subset of features. Subsequently, samples can be visualized in this low-dimensionality space for human understanding. To this end, we develop PROTEUS, an AutoML pipeline to produce the surrogate model, specifically designed for feature selection on imbalanced datasets. The PROTEUS surrogate model can not only explain the training data, but also the out-of-sample (unseen) data. In other words, PROTEUS produces predictive explanations by approximating the decision surface of an unsupervised detector. PROTEUS is designed to return an accurate estimate of out-of-sample predictive performance to serve as a metric of the quality of the approximation. Computational experiments confirm the efficacy of PROTEUS to produce predictive explanations for different families of detectors and to reliably estimate their predictive performance in unseen data. Unlike several ad-hoc feature importance methods, PROTEUS is robust to high-dimensional data.

翻译：以不受监督的方式为检测异常(外相、新奇)提出了众多的算法。不幸的是,一般地说,理解为什么将特定样本(记录)标为异常,从而诊断其根源的原因并非无关紧要。我们建议采用以下降低维度、代用模型方法来解释探测器的决定:将检测模型与仅使用一小部分特征的另一种模型相近;随后,可以在这个低维空间为人类理解提供样本。为此,我们开发了PROTEUS,这是一个自动ML管道,用于制作替代模型,专门为不平衡数据集的特征选择设计。PROTEUS代用模型不仅可以解释培训数据,而且可以解释外观(不见)数据。换句话说,PROTEUS通过对一个不超强探测器的决策表面进行近似化分析来产生预测解释。PROTEUS的精确预测性能是用来测量近似性数据集质量的尺度。精确的预测性能和精确性能的预测性能是其高分辨率的预测性能。

0

相关内容

Performer

【数据科学导论书】Introduction to Datascience，253页pdf

【数据科学导论书】Introduction to Datascience，253页pdf

专知会员服务

50+阅读 · 2021年11月15日

金融人工智能，40页pdf

金融人工智能，40页pdf

专知会员服务

147+阅读 · 2021年10月9日

【ACL2021】基于图表示的多元关系链接预测

专知会员服务

34+阅读 · 2021年8月9日

5G+ICT趋势白皮书（2021年），53页pdf

专知会员服务

58+阅读 · 2021年3月15日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

异常检测（Anomaly Detection）综述

异常检测（Anomaly Detection）综述

极市平台

20+阅读 · 2020年10月24日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Evaluation of survival distribution predictions with discrimination measures

Arxiv

0+阅读 · 2021年12月9日

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

Arxiv

0+阅读 · 2021年12月7日

Posterior Predictive Null Checks

Arxiv

0+阅读 · 2021年12月6日

Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data

Arxiv

0+阅读 · 2021年12月6日

Interpretable discriminant analysis for functional data supported on random non-linear domains

Arxiv

0+阅读 · 2021年12月5日

Time-series Change Point Detection with Self-Supervised Contrastive Predictive Coding

Arxiv

9+阅读 · 2020年11月28日

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

Arxiv

4+阅读 · 2019年11月6日

Interaction Embeddings for Prediction and Explanation in Knowledge Graphs

Arxiv

8+阅读 · 2019年3月12日

Anomaly DetectionWith Multiple-Hypotheses Predictions

Arxiv

6+阅读 · 2019年1月28日

Representation Learning with Contrastive Predictive Coding

Arxiv

6+阅读 · 2019年1月22日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【数据科学导论书】Introduction to Datascience，253页pdf

【数据科学导论书】Introduction to Datascience，253页pdf

专知会员服务

50+阅读 · 2021年11月15日

金融人工智能，40页pdf

金融人工智能，40页pdf

专知会员服务

147+阅读 · 2021年10月9日

【ACL2021】基于图表示的多元关系链接预测

专知会员服务

34+阅读 · 2021年8月9日

5G+ICT趋势白皮书（2021年），53页pdf

专知会员服务

58+阅读 · 2021年3月15日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《全谱战争——从拓宽工具到思考不可思考之事》

《FPV武装无人机的战斗飞行艺术与科学》最新报告

无人机作战：演进、创新与未来战场

《反无人机：用于无人机探测与定位的多输入多输出雷达》最新69页

相关资讯

异常检测（Anomaly Detection）综述

异常检测（Anomaly Detection）综述

极市平台

20+阅读 · 2020年10月24日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Evaluation of survival distribution predictions with discrimination measures

Arxiv

0+阅读 · 2021年12月9日

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

Arxiv

0+阅读 · 2021年12月7日

Posterior Predictive Null Checks

Arxiv

0+阅读 · 2021年12月6日

Clue Me In: Semi-Supervised FGVC with Out-of-Distribution Data

Arxiv

0+阅读 · 2021年12月6日

Interpretable discriminant analysis for functional data supported on random non-linear domains

Arxiv

0+阅读 · 2021年12月5日

Time-series Change Point Detection with Self-Supervised Contrastive Predictive Coding

Arxiv

9+阅读 · 2020年11月28日

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

Arxiv

4+阅读 · 2019年11月6日

Interaction Embeddings for Prediction and Explanation in Knowledge Graphs

Arxiv

8+阅读 · 2019年3月12日

Anomaly DetectionWith Multiple-Hypotheses Predictions

Arxiv

6+阅读 · 2019年1月28日

Representation Learning with Contrastive Predictive Coding

Arxiv

6+阅读 · 2019年1月22日

微信扫码咨询专知VIP会员