This study aims to establish a computer-aided diagnostic system for lung lesions using endobronchial ultrasound (EBUS) to assist physicians in identifying lesion areas. During EBUS-transbronchial needle aspiration (EBUS-TBNA) procedures, hysicians rely on grayscale ultrasound images to determine the location of lesions. However, these images often contain significant noise and can be influenced by surrounding tissues or blood vessels, making identification challenging. Previous research has lacked the application of object detection models to EBUS-TBNA, and there has been no well-defined solution for the lack of annotated data in the EBUS-TBNA dataset. In related studies on ultrasound images, although models have been successful in capturing target regions for their respective tasks, their training and predictions have been based on two-dimensional images, limiting their ability to leverage temporal features for improved predictions. This study introduces a three-dimensional video-based object detection model. It first generates a set of improved queries using a diffusion model, then captures temporal correlations through an attention mechanism. A filtering mechanism selects relevant information from previous frames to pass to the current frame. Subsequently, a teacher-student model training approach is employed to further optimize the model using unlabeled data. By incorporating various data augmentation and feature alignment, the model gains robustness against interference. Test results demonstrate that this model, which captures spatiotemporal information and employs semi-supervised learning methods, achieves an Average Precision (AP) of 48.7 on the test dataset, outperforming other models. It also achieves an Average Recall (AR) of 79.2, significantly leading over existing models.
翻译:暂无翻译