以美元为单位的中值核心单位的经验评估 (An Empirical Evaluation of $k$-Means Coresets) - 专知论文

会员服务 ·

0

Performer · 簇 · 论文 · 算法与数据结构 · 计算学习理论 ·

2022 年 7 月 3 日

An Empirical Evaluation of $k$-Means Coresets

翻译：以美元为单位的中值核心单位的经验评估

Chris Schwiegelshohn,Omar Ali Sheikh-Omar

Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as $k$-means in both theory and practice. Curiously, there exists no work on comparing the quality of available $k$-means coresets. In this paper we perform such an evaluation. There currently is no algorithm known to measure the distortion of a candidate coreset. We provide some evidence as to why this might be computationally difficult. To complement this, we propose a benchmark for which we argue that computing coresets is challenging and which also allows us an easy (heuristic) evaluation of coresets. Using this benchmark and real-world data sets, we conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.

翻译：核心数据集是总结数据最受欢迎的范式之一。特别是,在理论和实践上都有许多高性能核心数据集用于分组问题, 如美元汇率。奇怪的是, 在比较可用美元汇率核心数据集的质量方面没有做任何工作。在本文中,我们进行了这样的评估。目前还没有已知的算法来测量候选人核心数据集的扭曲情况。我们提供了一些证据,说明这在计算上可能很困难。为了补充这一点,我们提出了一个基准,即计算核心数据集具有挑战性,也使我们能够对核心数据集进行简单(超常)的评估。我们利用这个基准和现实世界数据集,从理论和实践上对最常用的核心数据集进行详尽的评估。

0

相关内容

Performer

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

金属氧化物界面的自旋极化电子输运研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属活性区临界装置外源等效性研究

国家自然科学基金

0+阅读 · 2013年12月31日

OX40/OX40L介导滤泡辅助性T细胞激活促进动脉粥样硬化进展的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

分泌型金属蛋白酶CLCA在哮喘气道重塑中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

基于SPR效应的贵金属/电纺SiO2纳米管/层状SnS2复合材料的构建及其可见光催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

多天线OFDM信道全信息压缩估计理论与方法

国家自然科学基金

0+阅读 · 2011年12月31日

金属磷酸盐纳米管的催化性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

小电导Ca2+激活K+通道与ryanodine受体功能性偶联的研究

国家自然科学基金

0+阅读 · 2008年12月31日

金属基体上原位形成摩擦修复膜的优化设计与机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Evaluating Synthetic Bugs

Arxiv

0+阅读 · 2022年8月23日

The Value of AI Guidance in Human Examination of Synthetically-Generated Faces

Arxiv

0+阅读 · 2022年8月22日

Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification

Arxiv

0+阅读 · 2022年8月21日

Do-AIQ: A Design-of-Experiment Approach to Quality Evaluation of AI Mislabel Detection Algorithm

Arxiv

0+阅读 · 2022年8月21日

Role of Data Augmentation in Unsupervised Anomaly Detection

Arxiv

0+阅读 · 2022年8月21日

An Upper Limit of Decaying Rate with Respect to Frequency in Deep Neural Network

Arxiv

0+阅读 · 2022年8月21日

Use-Case-Grounded Simulations for Explanation Evaluation

Arxiv

0+阅读 · 2022年8月20日

Cross-Domain Evaluation of a Deep Learning-Based Type Inference System

Arxiv

0+阅读 · 2022年8月19日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

算法与数据结构

计算学习理论

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Evaluating Synthetic Bugs

Arxiv

0+阅读 · 2022年8月23日

The Value of AI Guidance in Human Examination of Synthetically-Generated Faces

Arxiv

0+阅读 · 2022年8月22日

Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification

Arxiv

0+阅读 · 2022年8月21日

Do-AIQ: A Design-of-Experiment Approach to Quality Evaluation of AI Mislabel Detection Algorithm

Arxiv

0+阅读 · 2022年8月21日

Role of Data Augmentation in Unsupervised Anomaly Detection

Arxiv

0+阅读 · 2022年8月21日

An Upper Limit of Decaying Rate with Respect to Frequency in Deep Neural Network

Arxiv

0+阅读 · 2022年8月21日

Use-Case-Grounded Simulations for Explanation Evaluation

Arxiv

0+阅读 · 2022年8月20日

Cross-Domain Evaluation of a Deep Learning-Based Type Inference System

Arxiv

0+阅读 · 2022年8月19日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

金属氧化物界面的自旋极化电子输运研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属活性区临界装置外源等效性研究

国家自然科学基金

0+阅读 · 2013年12月31日

OX40/OX40L介导滤泡辅助性T细胞激活促进动脉粥样硬化进展的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

分泌型金属蛋白酶CLCA在哮喘气道重塑中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

基于SPR效应的贵金属/电纺SiO2纳米管/层状SnS2复合材料的构建及其可见光催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

多天线OFDM信道全信息压缩估计理论与方法

国家自然科学基金

0+阅读 · 2011年12月31日

金属磷酸盐纳米管的催化性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

小电导Ca2+激活K+通道与ryanodine受体功能性偶联的研究

国家自然科学基金

0+阅读 · 2008年12月31日

金属基体上原位形成摩擦修复膜的优化设计与机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员