Inspired by Curriculum Learning, we propose a consecutive (i.e. image-to-text-to-text) generation framework where we divide the problem of radiology report generation into two steps. Contrary to generating the full radiology report from the image at once, the model generates global concepts from the image in the first step and then reforms them into finer and coherent texts using transformer-based architecture. We follow the transformer-based sequence-to-sequence paradigm at each step. We improve upon the state-of-the-art on two benchmark datasets.
翻译:在课程学习的启发下,我们提出一个连续(即图像到文本到文本)生成框架,将放射学报告生成问题分为两个步骤。 与同时从图像中生成完整的放射学报告相反,模型在第一步从图像中生成全球概念,然后用变压器结构将其改革为更精细和一致的文本。 我们每个步骤都遵循以变压器为基础的顺序到顺序模式。 我们改进了两个基准数据集的最新技术。