Background: The current clinical workflow for esophageal gross tumor volume (GTV) contouring relies on manual delineation of high labor-costs and interuser variability. Purpose: To validate the clinical applicability of a deep learning (DL) multi-modality esophageal GTV contouring model, developed at 1 institution whereas tested at multiple ones. Methods and Materials: We collected 606 esophageal cancer patients from four institutions. 252 institution-1 patients had a treatment planning-CT (pCT) and a pair of diagnostic FDG-PETCT; 354 patients from other 3 institutions had only pCT. A two-streamed DL model for GTV segmentation was developed using pCT and PETCT scans of a 148 patient institution-1 subset. This built model had the flexibility of segmenting GTVs via only pCT or pCT+PETCT combined. For independent evaluation, the rest 104 institution-1 patients behaved as unseen internal testing, and 354 institutions 2-4 patients were used for external testing. We evaluated manual revision degrees by human experts to assess the contour-editing effort. The performance of the deep model was compared against 4 radiation oncologists in a multiuser study with 20 random external patients. Contouring accuracy and time were recorded for the pre-and post-DL assisted delineation process. Results: Our model achieved high segmentation accuracy in internal testing (mean Dice score: 0.81 using pCT and 0.83 using pCT+PET) and generalized well to external evaluation (mean DSC: 0.80). Expert assessment showed that the predicted contours of 88% patients need only minor or no revision. In multi-user evaluation, with the assistance of a deep model, inter-observer variation and required contouring time were reduced by 37.6% and 48.0%, respectively. Conclusions: Deep learning predicted GTV contours were in close agreement with the ground truth and could be adopted clinically with mostly minor or no changes.
翻译:背景:目前食道肿瘤总总量(GTV)的临床工作流程取决于人工划定高人工成本和用户间变异性。 目的 : 校验在1个机构开发的深度学习( DL) 多模式食道 GTV同流模式的临床适用性, 而在多个机构进行测试。 方法和材料: 我们从4个机构收集了606个食道癌病人。 252个机构-1 病人有治疗规划-CT (PCT) 和一组诊断性 FDG- PETCT; 354个其他3机构病人只有PCT。 一个双流的GTV分解模式是使用148个病人机构-1子集的PCT和PETCT扫描。 这个构建的模式具有通过PCT或pCT+PETCT综合检测来分割GTV的灵活性。 在独立评估中, 其余104个机构-1 病人的行为表现为秘密内部测试模式, 354个机构2至4个病人用于外部测试。 我们评估了人类专家的人工变换度, 评估对轮廓- 时间评估努力没有PCT 。 使用深度时间 。