防止数据集从断断机学习生物标志物的数据集转移 (Preventing dataset shift from breaking machine-learning biomarkers) - 专知论文

会员服务 ·

0

数据集 · Machine Learning · 学成 · 计算学习理论 · 统计理论 ·

2021 年 7 月 21 日

Preventing dataset shift from breaking machine-learning biomarkers

翻译：防止数据集从断断机学习生物标志物的数据集转移

Jéroôme Dockès,Gaël Varoquaux,Jean-Baptiste Poline

from arxiv, GigaScience, BioMed Central, In press

Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g. because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts breaks machine-learning extracted biomarkers, as well as detection and correction strategies.

翻译：机器学习带来了寻找从生物医学计量丰富的组群中提取的新生物标志的希望。好的生物标志是一种能够可靠地检测相应条件的生物标志。然而,生物标志往往是从与目标人口不同的组群中提取的。这种不匹配,被称为数据集变换,会破坏生物标志对新人的应用。在生物医学研究中,数据集变换经常发生,例如由于招聘偏差。当数据集变换发生时,标准机器学习技术不足以提取和验证生物标志。本文章概述了数据集变换时间和如何打破机器学习提取生物标志,以及检测和校正战略。

0

相关内容

数据集

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

87+阅读 · 2021年1月11日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

【健康医疗中的机器学习算法综述】A Survey Of Machine Learning Algorithms In Health Care

【健康医疗中的机器学习算法综述】A Survey Of Machine Learning Algorithms In Health Care

专知会员服务

14+阅读 · 2019年11月19日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

245+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

教你构建机器学习项目：吴恩达新书《Machine Learning Yearning》

教你构建机器学习项目：吴恩达新书《Machine Learning Yearning》

专知

20+阅读 · 2018年4月6日

已删除

将门创投

9+阅读 · 2017年10月17日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

36+阅读 · 2021年8月2日

Machine Reasoning Explainability

Arxiv

14+阅读 · 2020年9月1日

Meta Learning for Causal Direction

Meta Learning for Causal Direction

Arxiv

5+阅读 · 2020年7月6日

Early-Learning Regularization Prevents Memorization of Noisy Labels

Early-Learning Regularization Prevents Memorization of Noisy Labels

Arxiv

3+阅读 · 2020年6月30日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Causality for Machine Learning

Arxiv

26+阅读 · 2019年11月24日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

A Survey of Learning Causality with Data: Problems and Methods

A Survey of Learning Causality with Data: Problems and Methods

Arxiv

19+阅读 · 2018年9月25日

Knowledge Based Machine Reading Comprehension

Knowledge Based Machine Reading Comprehension

Arxiv

4+阅读 · 2018年9月12日

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Arxiv

4+阅读 · 2017年11月15日

VIP会员

文章信息

相关主题

Machine Learning

计算学习理论

相关VIP内容

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

87+阅读 · 2021年1月11日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

【新书】Python机器学习实战，545页pdf，Practical Machine Learning with Python

专知会员服务

310+阅读 · 2020年2月26日

【健康医疗中的机器学习算法综述】A Survey Of Machine Learning Algorithms In Health Care

【健康医疗中的机器学习算法综述】A Survey Of Machine Learning Algorithms In Health Care

专知会员服务

14+阅读 · 2019年11月19日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

245+阅读 · 2019年10月21日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《2024年度美国防部作战测试与评估报告》500页

《面相未来作战空中系统中有人-无人编组的AI驱动协作模式选择》含slides

无人机编队飞行：复杂环境中作战的策略、挑战与应用

《探索军事背景下共享大语言模型：AI助手与智能体部署中可扩展性与效率的早期洞察》（含44页slides）

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

教你构建机器学习项目：吴恩达新书《Machine Learning Yearning》

教你构建机器学习项目：吴恩达新书《Machine Learning Yearning》

专知

20+阅读 · 2018年4月6日

已删除

将门创投

9+阅读 · 2017年10月17日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

相关论文

A Survey of Human-in-the-loop for Machine Learning

Arxiv

36+阅读 · 2021年8月2日

Machine Reasoning Explainability

Arxiv

14+阅读 · 2020年9月1日

Meta Learning for Causal Direction

Meta Learning for Causal Direction

Arxiv

5+阅读 · 2020年7月6日

Early-Learning Regularization Prevents Memorization of Noisy Labels

Early-Learning Regularization Prevents Memorization of Noisy Labels

Arxiv

3+阅读 · 2020年6月30日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Causality for Machine Learning

Arxiv

26+阅读 · 2019年11月24日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

A Survey of Learning Causality with Data: Problems and Methods

A Survey of Learning Causality with Data: Problems and Methods

Arxiv

19+阅读 · 2018年9月25日

Knowledge Based Machine Reading Comprehension

Knowledge Based Machine Reading Comprehension

Arxiv

4+阅读 · 2018年9月12日

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

Arxiv

4+阅读 · 2017年11月15日

微信扫码咨询专知VIP会员