近邻分类的可核实性 (Certifiable Robustness for Nearest Neighbor Classifiers) - 专知论文

会员服务 ·

0

稳健性 · 最近邻分类器 · 近邻 · 数据集 · MoDELS ·

2022 年 1 月 13 日

Certifiable Robustness for Nearest Neighbor Classifiers

翻译：近邻分类的可核实性

Austen Z. Fan,Paraschos Koutris

from arxiv, Accepted to ICDT'22

ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, $k$-Nearest Neighbors ($k$-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.

翻译：ML 模型通常使用质量高的大型数据集进行培训。但是, 培训数据集通常包含不一致或不完整的数据。要解决这一问题, 一种解决办法是开发算法, 可以检查对模型的预测是否可靠。鉴于一种产生分类器的学习算法, 并在测试时给出一个示例, 分类结果如果由在所有可能的世界中培训过的每个模型( 修复) 预测的不确定( 不一致) 数据集的不确定性( 修复), 分类结果是可靠的。这种稳健性概念自然属于某些答案的框架。在本文中, 我们研究为简单但广泛部署的分类算法( $k$- Nearest Neighbors $- NNNN) 验证稳健性的复杂性。我们的主要重点是当完整性受限是功能依赖( FDs) 时, 以不一致的方式建立不一致的数据集。我们的分类方法的复杂性概念是: 要么承认一个多盘时间算法, 要么就是它具有 ConnP- hard 。此外, 我们的主要重点是一个类似的直数, 我们用一种直观来算出一个我们可能由某版本的精确的精确的标签来算出一个目标。

0

相关内容

稳健性

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

近似计算中基于概率图模型的软错误量化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

低秩张量补全问题的算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

方差正则化的分类模型选择方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

复杂数据模型中的分布逼近方法

国家自然科学基金

3+阅读 · 2014年12月31日

考虑非定常气动力随机不确定性的气动弹性研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于压缩感知,矩阵填充和鲁棒的主成分分析的四元数信号处理方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

云计算环境下的可信服务组合及运行保障研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于业务风险的智能电网通信端到端QoS保障及评估模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

限制性通信网络扩容问题研究

国家自然科学基金

0+阅读 · 2011年12月31日

网络公用存储的可靠性与灾备技术

国家自然科学基金

0+阅读 · 2011年12月31日

Improving Proximity Classification for Contact Tracing using a Multi-channel Approach

Improving Proximity Classification for Contact Tracing using a Multi-channel Approach

Arxiv

0+阅读 · 2022年4月20日

Online Caching with no Regret: Optimistic Learning via Recommendations

Arxiv

0+阅读 · 2022年4月20日

Focus on the Common Good: Group Distributional Robustness Follows

Arxiv

0+阅读 · 2022年4月20日

Subset selection for linear mixed models

Arxiv

1+阅读 · 2022年4月18日

ExCon: Explanation-driven Supervised Contrastive Learning for Image Classification

Arxiv

0+阅读 · 2022年4月18日

Data-Centric Distrust Quantification for Responsible AI: When Data-driven Outcomes Are Not Reliable

Arxiv

0+阅读 · 2022年4月16日

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

Arxiv

0+阅读 · 2022年4月15日

Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees

Arxiv

0+阅读 · 2022年4月15日

So2Sat POP -- A Curated Benchmark Data Set for Population Estimation from Space on a Continental Scale

Arxiv

0+阅读 · 2022年4月7日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

最近邻分类器

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Improving Proximity Classification for Contact Tracing using a Multi-channel Approach

Improving Proximity Classification for Contact Tracing using a Multi-channel Approach

Arxiv

0+阅读 · 2022年4月20日

Online Caching with no Regret: Optimistic Learning via Recommendations

Arxiv

0+阅读 · 2022年4月20日

Focus on the Common Good: Group Distributional Robustness Follows

Arxiv

0+阅读 · 2022年4月20日

Subset selection for linear mixed models

Arxiv

1+阅读 · 2022年4月18日

ExCon: Explanation-driven Supervised Contrastive Learning for Image Classification

Arxiv

0+阅读 · 2022年4月18日

Data-Centric Distrust Quantification for Responsible AI: When Data-driven Outcomes Are Not Reliable

Arxiv

0+阅读 · 2022年4月16日

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

Arxiv

0+阅读 · 2022年4月15日

Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees

Arxiv

0+阅读 · 2022年4月15日

So2Sat POP -- A Curated Benchmark Data Set for Population Estimation from Space on a Continental Scale

Arxiv

0+阅读 · 2022年4月7日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

近似计算中基于概率图模型的软错误量化方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

低秩张量补全问题的算法研究

国家自然科学基金

0+阅读 · 2015年12月31日

方差正则化的分类模型选择方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

复杂数据模型中的分布逼近方法

国家自然科学基金

3+阅读 · 2014年12月31日

考虑非定常气动力随机不确定性的气动弹性研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于压缩感知,矩阵填充和鲁棒的主成分分析的四元数信号处理方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

云计算环境下的可信服务组合及运行保障研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于业务风险的智能电网通信端到端QoS保障及评估模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

限制性通信网络扩容问题研究

国家自然科学基金

0+阅读 · 2011年12月31日

网络公用存储的可靠性与灾备技术

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员