We introduce an improvement to the feature pyramid network of standard object detection models. We call our method enhanced featuRE Pyramid network by Local Image translation and Conjunct Attention, or REPLICA. REPLICA improves object detection performance by simultaneously (1) generating realistic but fake images with simulated objects to mitigate the data-hungry problem of the attention mechanism, and (2) advancing the detection model architecture through a novel modification of attention on image feature patches. Specifically, we use a convolutional autoencoder as a generator to create new images by injecting objects into images via local interpolation and reconstruction of their features extracted in hidden layers. Then due to the larger number of simulated images, we use a visual transformer to enhance outputs of each ResNet layer that serve as inputs to a feature pyramid network. We apply our methodology to the problem of detecting lesions in Digital Breast Tomosynthesis scans (DBT), a high-resolution medical imaging modality crucial in breast cancer screening. We demonstrate qualitatively and quantitatively that REPLICA can improve the accuracy of tumor detection using our enhanced standard object detection framework via experimental results.
翻译:我们改进了标准物体探测模型的金字塔特征网络。 我们称我们的方法通过本地图像翻译和 Concuncent 注意或REPLICA来增强Feature Pyramid网络。 REPLICA同时提高物体探测性能,方法是:(1) 以模拟物体生成现实而假的图像,以减轻注意机制的数据饥饿问题,(2) 通过对图像特征斑点的注意进行新颖的修改来推进探测模型结构。 具体地说,我们使用一个变动自动自动编码器作为生成器,通过对本地图像进行内插并重建在隐蔽层中提取的特征,将物体注入图像中,从而生成新的图像。 之后,由于模拟图像数量较多,我们使用视觉变异器来增强每个ResNet层的输出,作为特征金字塔网络的投入。 我们运用我们的方法来解决数字式乳腺合成扫描(DBT)中的损伤问题,这是一种在乳腺癌筛查中至关重要的高分辨率医学成像模式。 我们从质量和数量上证明REPLICA能够利用我们通过实验结果强化的标准物体探测框架提高肿瘤探测的精确度。