Breast lesion detection in ultrasound is critical for breast cancer diagnosis. Existing methods mainly rely on individual 2D ultrasound images or combine unlabeled video and labeled 2D images to train models for breast lesion detection. In this paper, we first collect and annotate an ultrasound video dataset (188 videos) for breast lesion detection. Moreover, we propose a clip-level and video-level feature aggregated network (CVA-Net) for addressing breast lesion detection in ultrasound videos by aggregating video-level lesion classification features and clip-level temporal features. The clip-level temporal features encode local temporal information of ordered video frames and global temporal information of shuffled video frames. In our CVA-Net, an inter-video fusion module is devised to fuse local features from original video frames and global features from shuffled video frames, and an intra-video fusion module is devised to learn the temporal information among adjacent video frames. Moreover, we learn video-level features to classify the breast lesions of the original video as benign or malignant lesions to further enhance the final breast lesion detection performance in ultrasound videos. Experimental results on our annotated dataset demonstrate that our CVA-Net clearly outperforms state-of-the-art methods. The corresponding code and dataset are publicly available at \url{https://github.com/jhl-Det/CVA-Net}.
翻译:在超声波中检测乳腺癌是乳腺癌诊断的关键。现有方法主要依靠个人 2D 超声图像,或结合无标签视频和标签的 2D 图像来培训乳腺癌检测模型。在本文中,我们首先收集和批注超声视频数据集(188 视频),用于检测乳腺损伤。此外,我们提议建立一个剪接和视频级特征汇总网络(CVA-Net),用于在超声波视频中检测乳腺损伤。我们学习视频级别分类特点和剪贴时间特征,将订购的视频框的局部时间级特征编码和洗发视频框的全球时间信息合并起来。在我们 CVA-Net中,我们设计了一个视频间融合模块,将原始视频框架的本地特征和全球特征结合起来,用于检测乳腺损伤(188 视频框架),此外,我们还设计了一个视频内部融合模块,用于在邻近的视频框架中学习时间信息。此外,我们还学习视频级别特征,将原始视频分为良性或恶性性损伤。在超音网中,进一步增强最终的乳腺损伤检测性损伤检测性功能。在超声波视频网络中,实验结果显示我们可用的CVA-VA-st可提供的数据。