平衡分类的抽取和过滤取舍的取舍和过滤取舍 (Tradeoffs in Resampling and Filtering for Imbalanced Classification) - 专知论文

会员服务 ·

0

Performer · 测试数据 · 基 · 训练数据 · 模型性能 ·

2022 年 8 月 31 日

Tradeoffs in Resampling and Filtering for Imbalanced Classification

翻译：平衡分类的抽取和过滤取舍的取舍和过滤取舍

Ryan Muther,David Smith

Imbalanced classification problems are extremely common in natural language processing and are solved using a variety of resampling and filtering techniques, which often involve making decisions on how to select training data or decide which test examples should be labeled by the model. We examine the tradeoffs in model performance involved in choices of training sample and filter training and test data in heavily imbalanced token classification task and examine the relationship between the magnitude of these tradeoffs and the base rate of the phenomenon of interest. In experiments on sequence tagging to detect rare phenomena in English and Arabic texts, we find that different methods of selecting training data bring tradeoffs in effectiveness and efficiency. We also see that in highly imbalanced cases, filtering test data using first-pass retrieval models is as important for model performance as selecting training data. The base rate of a rare positive class has a clear effect on the magnitude of the changes in performance caused by the selection of training or test data. As the base rate increases, the differences brought about by those choices decreases.

翻译：平衡的分类问题在自然语言处理中极为常见,采用各种重新采样和过滤技术加以解决,这些技术往往涉及决定如何选择培训数据,或决定该模型应标注哪些测试实例。我们检查在选择培训抽样和过滤培训和测试数据方面,在选择培训抽样和筛选培训和测试数据方面的模型性能权衡,在极为不平衡的象征性分类任务中,我们检查这些权衡的幅度与兴趣现象的基准率之间的关系。在为检测英语和阿拉伯语文本中的罕见现象而进行排序标记的实验中,我们发现,选择培训数据的不同方法在有效性和效率方面带来权衡。我们也发现,在高度不平衡的情况下,使用第一流检索模型过滤测试数据对于示范性业绩与选择培训数据一样重要。一个罕见的积极类的基本率对选择培训或测试数据导致的绩效变化的程度有着明显的影响。随着基率的增加,这些选择所带来的差异也会缩小。

0

相关内容

Performer

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

CTLA-4/B7通路在调节性γδ T细胞调控造血干细胞移植后异源反应性T细胞活性中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

半乳糖凝集素-1通过诱导耐受型树突状细胞分化改善大鼠肝移植排斥反应预后的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Tip60在oxLDL诱导的血管平滑肌细胞自噬及增殖中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

异位灶微环境调节性T细胞的功能调节及其作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

MDSCs在动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

干细胞移植诱导产生调节性T细胞在同种异体心脏移植中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

miR-126对CD4+CD25+调节性T细胞外周诱导的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

抗原特异性和非抗原特异性CD4+CD25+ Treg细胞对Th1细胞分化、效应功能和记忆Th1细胞形成的影响

国家自然科学基金

0+阅读 · 2008年12月31日

On Classification Thresholds for Graph Attention with Edge Features

Arxiv

0+阅读 · 2022年10月18日

Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization

Arxiv

0+阅读 · 2022年10月17日

Hybrid Intelligent Testing in Simulation-Based Verification

Arxiv

0+阅读 · 2022年10月16日

Automatic Rule Induction for Interpretable Semi-Supervised Learning

Arxiv

0+阅读 · 2022年10月14日

Motion-related Artefact Classification Using Patch-based Ensemble and Transfer Learning in Cardiac MRI

Arxiv

0+阅读 · 2022年10月14日

Interpretable CNNs for Object Classification

Interpretable CNNs for Object Classification

Arxiv

20+阅读 · 2020年3月12日

Graph Convolutional Label Noise Cleaner: Train a Plug-and-play Action Classifier for Anomaly Detection

Graph Convolutional Label Noise Cleaner: Train a Plug-and-play Action Classifier for Anomaly Detection

Arxiv

15+阅读 · 2019年3月18日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Multi-pseudo Regularized Label for Generated Samples in Person Re-Identification

Arxiv

12+阅读 · 2018年1月29日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

On Classification Thresholds for Graph Attention with Edge Features

Arxiv

0+阅读 · 2022年10月18日

Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task Generalization

Arxiv

0+阅读 · 2022年10月17日

Hybrid Intelligent Testing in Simulation-Based Verification

Arxiv

0+阅读 · 2022年10月16日

Automatic Rule Induction for Interpretable Semi-Supervised Learning

Arxiv

0+阅读 · 2022年10月14日

Motion-related Artefact Classification Using Patch-based Ensemble and Transfer Learning in Cardiac MRI

Arxiv

0+阅读 · 2022年10月14日

Interpretable CNNs for Object Classification

Interpretable CNNs for Object Classification

Arxiv

20+阅读 · 2020年3月12日

Graph Convolutional Label Noise Cleaner: Train a Plug-and-play Action Classifier for Anomaly Detection

Graph Convolutional Label Noise Cleaner: Train a Plug-and-play Action Classifier for Anomaly Detection

Arxiv

15+阅读 · 2019年3月18日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Multi-pseudo Regularized Label for Generated Samples in Person Re-Identification

Arxiv

12+阅读 · 2018年1月29日

相关基金

CTLA-4/B7通路在调节性γδ T细胞调控造血干细胞移植后异源反应性T细胞活性中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

半乳糖凝集素-1通过诱导耐受型树突状细胞分化改善大鼠肝移植排斥反应预后的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Tip60在oxLDL诱导的血管平滑肌细胞自噬及增殖中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

异位灶微环境调节性T细胞的功能调节及其作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

MDSCs在动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

干细胞移植诱导产生调节性T细胞在同种异体心脏移植中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

miR-126对CD4+CD25+调节性T细胞外周诱导的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

抗原特异性和非抗原特异性CD4+CD25+ Treg细胞对Th1细胞分化、效应功能和记忆Th1细胞形成的影响

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员