改良朴素贝叶斯算法在错标数据上的应用 (Improved Naive Bayes with Mislabeled Data) - 专知论文

会员服务 ·

0

朴素贝叶斯 · 朴素贝叶斯算法 · 贝叶斯 · 朴素贝叶斯方法 · 贝叶斯方法 ·

2023 年 4 月 13 日

Improved Naive Bayes with Mislabeled Data

翻译：改良朴素贝叶斯算法在错标数据上的应用

Qianhan Zeng,Yingqiu Zhu,Xuening Zhu,Feifei Wang,Weichen Zhao,Shuning Sun,Meng Su,Hansheng Wang

Labeling mistakes are frequently encountered in real-world applications. If not treated well, the labeling mistakes can deteriorate the classification performances of a model seriously. To address this issue, we propose an improved Naive Bayes method for text classification. It is analytically simple and free of subjective judgements on the correct and incorrect labels. By specifying the generating mechanism of incorrect labels, we optimize the corresponding log-likelihood function iteratively by using an EM algorithm. Our simulation and experiment results show that the improved Naive Bayes method greatly improves the performances of the Naive Bayes method with mislabeled data.

翻译：标注错误在现实应用中经常会遇到。如果处理不当，标注错误会严重影响模型的分类性能。为了解决这个问题，我们提出了一种改良的朴素贝叶斯方法用于文本分类。它解析简单，不涉及正确和错误标签的主观判断。通过指定错误标签的生成机制，我们使用EM算法迭代优化相应的对数似然函数。我们的模拟和实验结果表明，改良的朴素贝叶斯方法极大地提高了利用错标数据的朴素贝叶斯算法的性能。

0

相关内容

朴素贝叶斯

朴素贝叶斯

朴素贝叶斯法是基于贝叶斯定理与特征条件独立假设的分类方法。对于给定的训练数据集，首先基于“特征条件独立”的假设学习输入/输出的联合概率分布。然后基于此模型，对给定输入x，利用贝叶斯定理求后验概率最大的y。朴素贝叶斯实现简单，学习与预测的效率都很高，是一种常用的方法。

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【机器学习教程】生物导体MLInterfaces包到基因表达数据的应用，applications of the BioconductorMLInterfaces package to gene expression data

【机器学习教程】生物导体MLInterfaces包到基因表达数据的应用，applications of the BioconductorMLInterfaces package to gene expression data

专知会员服务

18+阅读 · 2020年1月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

多重排序数据的整合分析

国家自然科学基金

0+阅读 · 2015年12月31日

冲积河流过程水沙输移模型不确定性分析及数据同化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

生物特征识别中高维数据的统计降维及算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子点/官能团复合体系的界面态发光

国家自然科学基金

0+阅读 · 2009年12月31日

水稻和高粱基因进化的比较基因组学分析

国家自然科学基金

0+阅读 · 2009年12月31日

How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?

Arxiv

0+阅读 · 2023年5月31日

Dynamic Factor Models for Binary Data in Circular Spaces: An Application to the U.S. Supreme Court

Arxiv

0+阅读 · 2023年5月30日

Bayesian approach to Gaussian process regression with uncertain inputs

Arxiv

0+阅读 · 2023年5月28日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

VIP会员

文章信息

相关主题

朴素贝叶斯

朴素贝叶斯算法

朴素贝叶斯方法

贝叶斯方法

相关VIP内容

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【机器学习教程】生物导体MLInterfaces包到基因表达数据的应用，applications of the BioconductorMLInterfaces package to gene expression data

【机器学习教程】生物导体MLInterfaces包到基因表达数据的应用，applications of the BioconductorMLInterfaces package to gene expression data

专知会员服务

18+阅读 · 2020年1月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

智能体工程（Agent Engineering）

《全球地缘政治环境中的反无人机系统互操作性》252页

专业软件开发者不靠“氛围编程”（Vibe Coding），而靠“控制”：2025 年 AI Agent 在编程中的应用研究

基于大语言模型的智能体化软件问题解决：综述

相关资讯

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?

Arxiv

0+阅读 · 2023年5月31日

Dynamic Factor Models for Binary Data in Circular Spaces: An Application to the U.S. Supreme Court

Arxiv

0+阅读 · 2023年5月30日

Bayesian approach to Gaussian process regression with uncertain inputs

Arxiv

0+阅读 · 2023年5月28日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

相关基金

多重排序数据的整合分析

国家自然科学基金

0+阅读 · 2015年12月31日

冲积河流过程水沙输移模型不确定性分析及数据同化方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

生物特征识别中高维数据的统计降维及算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子点/官能团复合体系的界面态发光

国家自然科学基金

0+阅读 · 2009年12月31日

水稻和高粱基因进化的比较基因组学分析

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员