RoentGen:胸透X光生成的愿景-语言基金会模型 (RoentGen: Vision-Language Foundation Model for Chest X-ray Generation)

Pierre Chambon,Christian Bluethgen,Jean-Benoit Delbrouck,Rogier Van der Sluijs,Małgorzata Połacin,Juan Manuel Zambrano Chaves,Tanishq Mathew Abraham,Shivanshu Purohit,Curtis P. Langlotz,Akshay Chaudhari

from arxiv, 19 pages

Multimodal models trained on large natural image-text pair datasets have exhibited astounding abilities in generating high-quality images. Medical imaging data is fundamentally different to natural images, and the language used to succinctly capture relevant details in medical data uses a different, narrow but semantically rich, domain-specific vocabulary. Not surprisingly, multi-modal models trained on natural image-text pairs do not tend to generalize well to the medical domain. Developing generative imaging models faithfully representing medical concepts while providing compositional diversity could mitigate the existing paucity of high-quality, annotated medical imaging datasets. In this work, we develop a strategy to overcome the large natural-medical distributional shift by adapting a pre-trained latent diffusion model on a corpus of publicly available chest x-rays (CXR) and their corresponding radiology (text) reports. We investigate the model's ability to generate high-fidelity, diverse synthetic CXR conditioned on text prompts. We assess the model outputs quantitatively using image quality metrics, and evaluate image quality and text-image alignment by human domain experts. We present evidence that the resulting model (RoentGen) is able to create visually convincing, diverse synthetic CXR images, and that the output can be controlled to a new extent by using free-form text prompts including radiology-specific language. Fine-tuning this model on a fixed training set and using it as a data augmentation method, we measure a 5% improvement of a classifier trained jointly on synthetic and real images, and a 3% improvement when trained on a larger but purely synthetic training set. Finally, we observe that this fine-tuning distills in-domain knowledge in the text-encoder and can improve its representation capabilities of certain diseases like pneumothorax by 25%.

翻译：在大型天然图像-文本配对数据集方面受过培训的多式模型在生成高质量图像方面表现出惊人的能力。医学成像数据与自然图像有根本的不同,而用于简明地捕捉医疗数据相关细节的语言使用一种不同、狭窄但精度丰富的域名词汇。在天然图像-文本配对方面受过培训的多式模型往往不会向医学领域广泛推广。开发忠实地代表医学概念的基因化成像模型,同时提供成像多样性,可以减轻目前缺乏的高质量、附加说明的医学成像数据集。在这项工作中,我们制定战略,通过在公众可获取的胸部X射线(CXR)及其相应的放射学(文本)的堆中修改经过事先训练的潜在扩散模型来克服大程度的自然-医学分布变化。我们调查该模型是否有能力生成某种高纤维化、多种合成的CXRR,我们用图像质量衡量质量改进,评价图像质量和文本比对人类域专家的精确度调整。我们用一个经过训练的精度度测量的模型(RodG-ralalalimalalal lial lial lial lial lial lide lide) 数据在使用这个模型中可以使我们能够建立一个可令人信质化的模型的模型上,一个可令人信质化的C。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日