用于外科场景理解的类级递增反比学习任务软件非同步多任务卡式模式 (Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene Understanding)

from arxiv, Manuscript accepted in the International Journal of Computer Assisted Radiology and Surgery. codes available: https://github.com/lalithjets/Domain-adaptation-in-MTL

Purpose: Surgery scene understanding with tool-tissue interaction recognition and automatic report generation can play an important role in intra-operative guidance, decision-making and postoperative analysis in robotic surgery. However, domain shifts between different surgeries with inter and intra-patient variation and novel instruments' appearance degrade the performance of model prediction. Moreover, it requires output from multiple models, which can be computationally expensive and affect real-time performance. Methodology: A multi-task learning (MTL) model is proposed for surgical report generation and tool-tissue interaction prediction that deals with domain shift problems. The model forms of shared feature extractor, mesh-transformer branch for captioning and graph attention branch for tool-tissue interaction prediction. The shared feature extractor employs class incremental contrastive learning (CICL) to tackle intensity shift and novel class appearance in the target domain. We design Laplacian of Gaussian (LoG) based curriculum learning into both shared and task-specific branches to enhance model learning. We incorporate a task-aware asynchronous MTL optimization technique to fine-tune the shared weights and converge both tasks optimally. Results: The proposed MTL model trained using task-aware optimization and fine-tuning techniques reported a balanced performance (BLEU score of 0.4049 for scene captioning and accuracy of 0.3508 for interaction detection) for both tasks on the target domain and performed on-par with single-task models in domain adaptation. Conclusion: The proposed multi-task model was able to adapt to domain shifts, incorporate novel instruments in the target domain, and perform tool-tissue interaction detection and report generation on par with single-task models.

翻译：目的:通过工具-问题互动识别和自动报告生成,外科现场理解工具-问题互动识别和工具-问题互动预测可在机器人外科手术内指导、决策和后科分析方面发挥重要作用。然而,不同手术间和住院间变异和新型仪器外观的不同手术间和住院间变异和新工具外观之间的域变换会降低模型预测的性能。此外,它需要多种模型的产出,这些模型可以计算昂贵,影响实时性能。方法:为外科报告生成和多任务互动预测提议了一个多任务学习模式。共享地差提取、中流转换分支用于说明和图形注意,用于工具-问题互动预测。共享地差变缩缩缩模型和图形注意分支的模型模式模式模式。共享地差提取工具使用类递增对比学习(CICL)来应对强度变化和在目标域域内出现的新类别表现。用于优化地平比值测试,拟议将MTL优化MT-优化技术用于模型的微调模型和升级地平级测试。