Gathering manually annotated images for the purpose of training a predictive model is far more challenging in the medical domain than for natural images as it requires the expertise of qualified radiologists. We therefore propose to take advantage of past radiological exams (specifically, knee X-ray examinations) and formulate a framework capable of learning the correspondence between the images and reports, and hence be capable of generating diagnostic reports for a given X-ray examination consisting of an arbitrary number of image views. We demonstrate how aggregating the image features of individual exams and using them as conditional inputs when training a language generation model results in auto-generated exam reports that correlate well with radiologist-generated reports.
翻译:为了培训预测模型而人工收集附加说明的图像,在医学领域比自然图像更具挑战性,因为它需要合格的放射学家的专门知识,因此我们提议利用过去的放射测试(特别是膝部X光检查),并制定一个能够学习图像与报告之间通信的框架,从而能够为特定X光检查产生诊断报告,这种检查包含任意数量的图像视图。我们展示了在培训语言生成模型后,如何将单个测试的图像特征汇总起来,并在培训与放射学家生成的报告密切相关的自动生成的测试报告时将其用作有条件的投入。