预测肺癌的初步研究：使用荷兰初级医疗免费文本医学记录的软提示调整 (Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes) - 专知论文

会员服务 ·

0

不平衡 · 分类问题 · 标准模型 · 词嵌入 · Kik ·

2023 年 3 月 28 日

Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes

翻译：预测肺癌的初步研究：使用荷兰初级医疗免费文本医学记录的软提示调整

Auke Elfrink,Iacopo Vagliano,Ameen Abu-Hanna,Iacer Calixto

from arxiv, A short version of this paper has been published at the 21st International Conference on Artificial Intelligence in Medicine (AIME 2023)

We investigate different natural language processing (NLP) approaches based on contextualised word representations for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians. Because lung cancer has a low prevalence in primary care, we also address the problem of classification under highly imbalanced classes. Specifically, we use large Transformer-based pretrained language models (PLMs) and investigate: 1) how \textit{soft prompt-tuning} -- an NLP technique used to adapt PLMs using small amounts of training data -- compares to standard model fine-tuning; 2) whether simpler static word embedding models (WEMs) can be more robust compared to PLMs in highly imbalanced settings; and 3) how models fare when trained on notes from a small number of patients. We find that 1) soft-prompt tuning is an efficient alternative to standard model fine-tuning; 2) PLMs show better discrimination but worse calibration compared to simpler static word embedding models as the classification problem becomes more imbalanced; and 3) results when training models on small number of patients are mixed and show no clear differences between PLMs and WEMs. All our code is available open source in \url{https://bitbucket.org/aumc-kik/prompt_tuning_cancer_prediction/}.

翻译：我们探究了不同的自然语言处理（NLP）方法，基于上下文表示的单词，用于使用荷兰初级保健医生的免费病人医疗记录的早期预测肺癌问题。因为肺癌在初级保健领域的发病率很低，所以我们还需要解决高度不平衡类别下的分类问题。具体来说，我们使用基于Transformer的预训练语言模型（PLMs），并研究：1）如何比较\textit{ 软提示调整}和标准模型微调—一种使用少量训练数据来适应PLMs的NLP技术; 2）在高度不平衡的情况下，简单的静态词嵌入模型（WEMs）是否比PLMs更具有鲁棒性; 3）当在少数患者的笔记上进行训练时，模型如何表现。我们发现：1）软提示调整是标准模型微调的有效替代方案; 2）随着分类问题变得更加不平衡，PLMs表现出更好的区分能力但校准性较差，而不是相对简单的静态词嵌入模型。3）当在少数患者的笔记上训练模型时，结果是混合的，没有清晰地显示出PLMs和WEMs之间的差异。我们的所有代码都可以在 \url {https://bitbucket.org/aumc-kik/prompt_tuning_cancer_prediction/} 中以开放源代码的形式使用。

0

相关内容

不平衡

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

专知会员服务

19+阅读 · 2022年3月4日

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

专知会员服务

89+阅读 · 2019年12月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【KDD2020-Tutorial】深度学习异常检测，180页ppt

【KDD2020-Tutorial】深度学习异常检测，180页ppt

专知

48+阅读 · 2020年8月28日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

计算文本相似度常用的四种方法

计算文本相似度常用的四种方法

论智

33+阅读 · 2018年5月18日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

在Python中使用SpaCy进行文本分类

在Python中使用SpaCy进行文本分类

专知

24+阅读 · 2018年5月8日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

信息多样性和信息摘要的关键问题研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于MEMS惯性传感器网络的帕金森病姿态监控和评估技术

国家自然科学基金

0+阅读 · 2012年12月31日

电脑豁达治疗对肺癌的康复作用及其脑代谢机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新疆哈萨克族食管癌代谢标志物及其代谢通路关键酶活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

应用代谢组学技术建立病理性瘢痕患者临床预测评估模型的研究

国家自然科学基金

0+阅读 · 2011年12月31日

可分非线性约束最小二乘问题的高性能算法与理论及在图像处理中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

子宫球蛋白相关蛋白1(UGRP1)在自身免疫性甲状腺病(AITD)中的作用和地位研究

国家自然科学基金

0+阅读 · 2009年12月31日

肝癌肝移植术后肿瘤转移复发预测分子的筛选及功能分析

国家自然科学基金

0+阅读 · 2009年12月31日

基于数据挖掘技术实现早期乳腺癌的个体化预后预测

国家自然科学基金

0+阅读 · 2009年12月31日

老化沥青混凝土的非线性粘弹性疲劳损伤与愈合效应研究

国家自然科学基金

0+阅读 · 2008年12月31日

Segment Anything Model for Medical Images?

Arxiv

0+阅读 · 2023年5月19日

Prevention is better than cure: a case study of the abnormalities detection in the chest

Arxiv

0+阅读 · 2023年5月18日

PaLM 2 Technical Report

Arxiv

1+阅读 · 2023年5月17日

Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability

Arxiv

0+阅读 · 2023年5月17日

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Arxiv

0+阅读 · 2023年5月17日

Zero-shot Clinical Entity Recognition using ChatGPT

Arxiv

0+阅读 · 2023年5月15日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Learning Conceptual-Contextual Embeddings for Medical Text

Arxiv

14+阅读 · 2020年3月12日

Learning Conceptual-Contexual Embeddings for Medical Text

Arxiv

27+阅读 · 2019年8月16日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

67页PPT【ML+气象】使用机器学习技术对季节和次季节研究和预测，Use of Machine Learning Techniques for Seasonal and Subseasonal Studies and Predictions

专知会员服务

19+阅读 · 2022年3月4日

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

专知会员服务

89+阅读 · 2019年12月15日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】逆强化学习中的部分可识别性与模型设定错误

投大模型岗？50道大型语言模型（LLM）面试问题汇总

深度学习的多视角三维重建技术综述

【ICML2025】扩散模型中参数高效微调的零样本适应

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【KDD2020-Tutorial】深度学习异常检测，180页ppt

【KDD2020-Tutorial】深度学习异常检测，180页ppt

专知

48+阅读 · 2020年8月28日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

计算文本相似度常用的四种方法

计算文本相似度常用的四种方法

论智

33+阅读 · 2018年5月18日

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

【论文推荐】最新十篇度量学习相关论文—可量化表示、非线性度量学习、在线深度量学习、大间隔最近邻、判别深度度量、域自适应

专知

12+阅读 · 2018年5月18日

在Python中使用SpaCy进行文本分类

在Python中使用SpaCy进行文本分类

专知

24+阅读 · 2018年5月8日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Segment Anything Model for Medical Images?

Arxiv

0+阅读 · 2023年5月19日

Prevention is better than cure: a case study of the abnormalities detection in the chest

Arxiv

0+阅读 · 2023年5月18日

PaLM 2 Technical Report

Arxiv

1+阅读 · 2023年5月17日

Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability

Arxiv

0+阅读 · 2023年5月17日

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Arxiv

0+阅读 · 2023年5月17日

Zero-shot Clinical Entity Recognition using ChatGPT

Arxiv

0+阅读 · 2023年5月15日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

Learning Conceptual-Contextual Embeddings for Medical Text

Arxiv

14+阅读 · 2020年3月12日

Learning Conceptual-Contexual Embeddings for Medical Text

Arxiv

27+阅读 · 2019年8月16日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

相关基金

信息多样性和信息摘要的关键问题研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于MEMS惯性传感器网络的帕金森病姿态监控和评估技术

国家自然科学基金

0+阅读 · 2012年12月31日

电脑豁达治疗对肺癌的康复作用及其脑代谢机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

新疆哈萨克族食管癌代谢标志物及其代谢通路关键酶活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

应用代谢组学技术建立病理性瘢痕患者临床预测评估模型的研究

国家自然科学基金

0+阅读 · 2011年12月31日

可分非线性约束最小二乘问题的高性能算法与理论及在图像处理中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

子宫球蛋白相关蛋白1(UGRP1)在自身免疫性甲状腺病(AITD)中的作用和地位研究

国家自然科学基金

0+阅读 · 2009年12月31日

肝癌肝移植术后肿瘤转移复发预测分子的筛选及功能分析

国家自然科学基金

0+阅读 · 2009年12月31日

基于数据挖掘技术实现早期乳腺癌的个体化预后预测

国家自然科学基金

0+阅读 · 2009年12月31日

老化沥青混凝土的非线性粘弹性疲劳损伤与愈合效应研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员