多层语义对齐：放射学报告生成中的 Unify, Align and Refine 方法 (Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation) - 专知论文

会员服务 ·

0

报告生成 · 模态 · 跨模态 · 报告 · 精化 ·

2023 年 3 月 28 日

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation

翻译：多层语义对齐：放射学报告生成中的 Unify, Align and Refine 方法

Yaowei L,Bang Yang,Xuxin Cheng,Zhihong Zhu,Hongxiang Li,Yuexian Zou

from arxiv, 8 pages,6 figures,4 tables

Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists. However, simultaneously establishing global correspondences between the image (e.g., Chest X-ray) and its related report and local alignments between image patches and keywords remains challenging. To this end, we propose an Unify, Align and then Refine (UAR) approach to learn multi-level cross-modal alignments and introduce three novel modules: Latent Space Unifier (LSU), Cross-modal Representation Aligner (CRA) and Text-to-Image Refiner (TIR). Specifically, LSU unifies multimodal data into discrete tokens, making it flexible to learn common knowledge among modalities with a shared network. The modality-agnostic CRA learns discriminative features via a set of orthonormal basis and a dual-gate mechanism first and then globally aligns visual and textual representations under a triplet contrastive loss. TIR boosts token-level local alignment via calibrating text-to-image attention with a learnable mask. Additionally, we design a two-stage training procedure to make UAR gradually grasp cross-modal alignments at different levels, which imitates radiologists' workflow: writing sentence by sentence first and then checking word by word. Extensive experiments and analyses on IU-Xray and MIMIC-CXR benchmark datasets demonstrate the superiority of our UAR against varied state-of-the-art methods.

翻译：自动生成放射学报告受到广泛关注，因为它可以减轻放射学家的工作负担。但是，在图像（如胸部X射线）及其相关报告之间同时建立全局对应关系和图像区域与关键词之间的局部对齐仍然具有挑战性。为此，我们提出了一种 Unify，Align and Refine（UAR）方法来学习多层次的跨模态对齐，并引入了三个新颖的模块：潜空间统一器（LSU），跨模态表示调整器（CRA）和文本到图像精化器（TIR）。具体而言，LSU 将多模态数据统一为离散的标记，使其能够通过共享网络学习各模态之间的共同知识。CRA 学习一个正交基和双门机制，首先获得区分性特征，然后在三元对比损失下全局对齐视觉和文本表示。TIR 通过可学习掩码校准文本到图像的注意力来提高标记级别的局部对齐。此外，我们设计了一个两阶段训练过程，逐步学习不同级别的跨模态对齐，模拟放射学家的工作流程：先一句一句写，然后逐字检查。在 IU-Xray 和 MIMIC-CXR 基准数据集上进行的广泛实验和分析证明了我们的 UAR 比各种现有方法优越。

0

相关内容

报告生成

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

[CVPR 2020 Oral-牛津] RandLA-Net:大场景三维点云语义分割新框架

[CVPR 2020 Oral-牛津] RandLA-Net:大场景三维点云语义分割新框架

专知会员服务

26+阅读 · 2020年3月15日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

专知会员服务

57+阅读 · 2019年11月23日

【图像分割| 2019最新综述】自然图像和医学图像的深层语义分割，附21页PDF（Deep Semantic Segmentation of Natural and Medical Images: A Review）

【图像分割| 2019最新综述】自然图像和医学图像的深层语义分割，附21页PDF（Deep Semantic Segmentation of Natural and Medical Images: A Review）

专知会员服务

54+阅读 · 2019年11月16日

【AAAI2020论文】概念结构化嵌入医疗文本表示（Learning Conceptual-Contextual Embeddings for Medical Text）

【AAAI2020论文】概念结构化嵌入医疗文本表示（Learning Conceptual-Contextual Embeddings for Medical Text）

专知会员服务

49+阅读 · 2019年11月15日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

29+阅读 · 2019年11月3日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

基于弱监督学习的图像语义分割研究

国家自然科学基金

4+阅读 · 2014年12月31日

基于篇章语义的文档级统计机器翻译研究

国家自然科学基金

0+阅读 · 2013年12月31日

X射线真彩色CT图像重建研究

国家自然科学基金

0+阅读 · 2013年12月31日

二维/三维无线传感器网络的骨架提取与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于超图形XGML的图像半结构化研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于融合的全向深度图像的生成及应用研究

国家自然科学基金

0+阅读 · 2010年12月31日

离子液体作介质色谱法测定水中痕量有机胺和有机酚的研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向复杂建筑物部件的地面激光扫描点云与近景影像混合三维建模方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

句子语义的视觉表示研究

国家自然科学基金

4+阅读 · 2009年12月31日

基于视觉注意机制的多尺度图像融合的研究

国家自然科学基金

1+阅读 · 2009年12月31日

The Brain Tumor Segmentation (BraTS) Challenge 2023: Brain MR Image Synthesis for Tumor Segmentation (BraSyn)

Arxiv

0+阅读 · 2023年5月18日

Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval

Arxiv

0+阅读 · 2023年5月18日

Global-Local Stepwise Generative Network for Ultra High-Resolution Image Restoration

Arxiv

0+阅读 · 2023年5月17日

Balancing Utility and Fairness in Submodular Maximization (Technical Report)

Arxiv

0+阅读 · 2023年5月17日

An Interactively Reinforced Paradigm for Joint Infrared-Visible Image Fusion and Saliency Object Detection

Arxiv

0+阅读 · 2023年5月17日

Clinical Note Owns its Hierarchy: Multi-Level Hypergraph Neural Networks for Patient-Level Representation Learning

Arxiv

0+阅读 · 2023年5月16日

InstaLoc: One-shot Global Lidar Localisation in Indoor Environments through Instance Learning

Arxiv

0+阅读 · 2023年5月16日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Interpretable and Efficient Heterogeneous Graph Convolutional Network

Arxiv

15+阅读 · 2021年9月8日

Knowledge Graph Alignment Network with Gated Multi-hop Neighborhood Aggregation

Arxiv

19+阅读 · 2019年11月20日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

【CVPR2020-Facebook】从检测到3D目标，FroDO: From Detections to 3D Objects

专知会员服务

33+阅读 · 2020年5月12日

[CVPR 2020 Oral-牛津] RandLA-Net:大场景三维点云语义分割新框架

[CVPR 2020 Oral-牛津] RandLA-Net:大场景三维点云语义分割新框架

专知会员服务

26+阅读 · 2020年3月15日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

【医学图像分割| 2019新综述】生物医学图像分割的机器学习技术：技术方面综述和最新应用介绍（Machine Learning Techniques for Biomedical Image Segmentation: An Overview of Technical Aspects and Introduction to State-of-Art Applications），附35页PDF

专知会员服务

57+阅读 · 2019年11月23日

【图像分割| 2019最新综述】自然图像和医学图像的深层语义分割，附21页PDF（Deep Semantic Segmentation of Natural and Medical Images: A Review）

【图像分割| 2019最新综述】自然图像和医学图像的深层语义分割，附21页PDF（Deep Semantic Segmentation of Natural and Medical Images: A Review）

专知会员服务

54+阅读 · 2019年11月16日

【AAAI2020论文】概念结构化嵌入医疗文本表示（Learning Conceptual-Contextual Embeddings for Medical Text）

【AAAI2020论文】概念结构化嵌入医疗文本表示（Learning Conceptual-Contextual Embeddings for Medical Text）

专知会员服务

49+阅读 · 2019年11月15日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

29+阅读 · 2019年11月3日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

【论文推荐】最新七篇图像描述生成相关论文—CNN+CNN、对抗样本、显著性和上下文注意力、条件生成对抗网络、风格化

专知

25+阅读 · 2018年5月28日

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

【论文推荐】最新六篇自动问答相关论文—无监督迁移学习、综述、生成式问答、QDEE、可扩展文档理解

专知

12+阅读 · 2018年5月9日

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

【论文推荐】最新六篇对抗自编码器相关论文—多尺度网络节点表示、生成对抗自编码、逆映射、Wasserstein、条件对抗、去噪

专知

20+阅读 · 2018年4月7日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

The Brain Tumor Segmentation (BraTS) Challenge 2023: Brain MR Image Synthesis for Tumor Segmentation (BraSyn)

Arxiv

0+阅读 · 2023年5月18日

Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval

Arxiv

0+阅读 · 2023年5月18日

Global-Local Stepwise Generative Network for Ultra High-Resolution Image Restoration

Arxiv

0+阅读 · 2023年5月17日

Balancing Utility and Fairness in Submodular Maximization (Technical Report)

Arxiv

0+阅读 · 2023年5月17日

An Interactively Reinforced Paradigm for Joint Infrared-Visible Image Fusion and Saliency Object Detection

Arxiv

0+阅读 · 2023年5月17日

Clinical Note Owns its Hierarchy: Multi-Level Hypergraph Neural Networks for Patient-Level Representation Learning

Arxiv

0+阅读 · 2023年5月16日

InstaLoc: One-shot Global Lidar Localisation in Indoor Environments through Instance Learning

Arxiv

0+阅读 · 2023年5月16日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Interpretable and Efficient Heterogeneous Graph Convolutional Network

Arxiv

15+阅读 · 2021年9月8日

Knowledge Graph Alignment Network with Gated Multi-hop Neighborhood Aggregation

Arxiv

19+阅读 · 2019年11月20日

相关基金

基于弱监督学习的图像语义分割研究

国家自然科学基金

4+阅读 · 2014年12月31日

基于篇章语义的文档级统计机器翻译研究

国家自然科学基金

0+阅读 · 2013年12月31日

X射线真彩色CT图像重建研究

国家自然科学基金

0+阅读 · 2013年12月31日

二维/三维无线传感器网络的骨架提取与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于超图形XGML的图像半结构化研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于融合的全向深度图像的生成及应用研究

国家自然科学基金

0+阅读 · 2010年12月31日

离子液体作介质色谱法测定水中痕量有机胺和有机酚的研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向复杂建筑物部件的地面激光扫描点云与近景影像混合三维建模方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

句子语义的视觉表示研究

国家自然科学基金

4+阅读 · 2009年12月31日

基于视觉注意机制的多尺度图像融合的研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员