使用非正式价值测试外部线测试 (Testing for Outliers with Conformal p-values) - 专知论文

会员服务 ·

0

Conformer · 异常点 · 相互独立的 · 假正例率 · 边缘化 ·

2021 年 4 月 19 日

Testing for Outliers with Conformal p-values

翻译：使用非正式价值测试外部线测试

Stephen Bates,Emmanuel Candès,Lihua Lei,Yaniv Romano,Matteo Sesia

This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.

翻译：本文研究用于非参数外向检测的 p 值的构建, 采用多重测试视角。目标是测试新的独立样本是否属于与参考数据集相同的分布或外向值。我们建议了一个基于一致推断的解决方案, 这个广泛适用的框架产生p 值, 其有效性微乎其微, 但在不同测试点上是相互依存的。我们证明这些 p 值具有积极的依赖性, 并允许精确的虚假发现率控制, 尽管在相对薄弱的边际意义上。然后我们引入一种新的方法来计算 p 值, 该方法既以培训数据为条件,又对不同的测试点独立; 这为强化类型I错误的保证铺平了道路。我们的结果偏离了传统的一致推断, 因为我们利用浓度不平等而不是组合论来确立我们的有限抽样保证。此外, 我们的技术还产生了一种一致的信心, 将任何外部检测算法的假正率作为适用于原始统计的临界值的函数。最后, 我们结果的相关性通过对真实和模拟数据进行数字实验来证明。

0

相关内容

Conformer

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【经典书】回归建模策略-线性模型、逻辑和有序回归应用，598页ppt，第二版

专知会员服务

53+阅读 · 2020年10月21日

最新《机器学习最优化》课程笔记，36页pdf，Optimization for Machine Learning

专知会员服务

170+阅读 · 2020年5月10日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

已删除

将门创投

5+阅读 · 2020年3月2日

异常检测论文大列表：方法、应用、综述

异常检测论文大列表：方法、应用、综述

专知

126+阅读 · 2019年7月15日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Gaussian Mixture Estimation from Weighted Samples

Arxiv

0+阅读 · 2021年6月9日

Quickest change detection with unknown parameters: Constant complexity and near optimality

Arxiv

0+阅读 · 2021年6月9日

Conditional Distributional Treatment Effect with Kernel Conditional Mean Embeddings and U-Statistic Regression

Conditional Distributional Treatment Effect with Kernel Conditional Mean Embeddings and U-Statistic Regression

Arxiv

0+阅读 · 2021年6月8日

Intrinsic Dimension Estimation

Arxiv

0+阅读 · 2021年6月8日

Balancing Geometry and Density: Path Distances on High-Dimensional Data

Arxiv

0+阅读 · 2021年6月7日

Exact Distribution-Free Hypothesis Tests for the Regression Function of Binary Classification via Conditional Kernel Mean Embeddings

Exact Distribution-Free Hypothesis Tests for the Regression Function of Binary Classification via Conditional Kernel Mean Embeddings

Arxiv

0+阅读 · 2021年6月7日

Explicit numerical approximation for logistic models with regime switching in finite and infinite horizons

Arxiv

0+阅读 · 2021年6月7日

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

Arxiv

0+阅读 · 2021年6月6日

A Witness Two-Sample Test

Arxiv

0+阅读 · 2021年6月6日

Simultaneous Confidence Corridors for Mean Functions in Functional Data Analysis of Imaging Data

Arxiv

0+阅读 · 2021年6月4日

VIP会员

文章信息

相关主题

相互独立的

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【经典书】回归建模策略-线性模型、逻辑和有序回归应用，598页ppt，第二版

专知会员服务

53+阅读 · 2020年10月21日

最新《机器学习最优化》课程笔记，36页pdf，Optimization for Machine Learning

专知会员服务

170+阅读 · 2020年5月10日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

【ICCV 2019 Toturial】Global Optimization for Geometric Understanding with Provable Guarantees（具有可证明保证的几何理解的全局优化）

专知会员服务

18+阅读 · 2019年11月1日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS 2025】视觉指令瓶颈微调

什么是模块化开放系统方法（MOSA）？从美陆军新型倾转旋翼机视角解读

【牛津博士论文】面向视觉、物理与语言应用的可信机器学习模型

医学领域大型语言模型的新进展

相关资讯

已删除

将门创投

5+阅读 · 2020年3月2日

异常检测论文大列表：方法、应用、综述

异常检测论文大列表：方法、应用、综述

专知

126+阅读 · 2019年7月15日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Gaussian Mixture Estimation from Weighted Samples

Arxiv

0+阅读 · 2021年6月9日

Quickest change detection with unknown parameters: Constant complexity and near optimality

Arxiv

0+阅读 · 2021年6月9日

Conditional Distributional Treatment Effect with Kernel Conditional Mean Embeddings and U-Statistic Regression

Conditional Distributional Treatment Effect with Kernel Conditional Mean Embeddings and U-Statistic Regression

Arxiv

0+阅读 · 2021年6月8日

Intrinsic Dimension Estimation

Arxiv

0+阅读 · 2021年6月8日

Balancing Geometry and Density: Path Distances on High-Dimensional Data

Arxiv

0+阅读 · 2021年6月7日

Exact Distribution-Free Hypothesis Tests for the Regression Function of Binary Classification via Conditional Kernel Mean Embeddings

Exact Distribution-Free Hypothesis Tests for the Regression Function of Binary Classification via Conditional Kernel Mean Embeddings

Arxiv

0+阅读 · 2021年6月7日

Explicit numerical approximation for logistic models with regime switching in finite and infinite horizons

Arxiv

0+阅读 · 2021年6月7日

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

Arxiv

0+阅读 · 2021年6月6日

A Witness Two-Sample Test

Arxiv

0+阅读 · 2021年6月6日

Simultaneous Confidence Corridors for Mean Functions in Functional Data Analysis of Imaging Data

Arxiv

0+阅读 · 2021年6月4日

微信扫码咨询专知VIP会员