CvT-ASSD: 以革命愿景为基础、以革命愿景为基础、向外转移的单一射击单发式多ox检测器 (CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot MultiBox Detector)

from arxiv, 9 pages;5 figures; conference: IEEE ICTAI; Acknowledgment: The research reported in this paper was supported in part by the National Natural Science Foundation of China under the grant 91746203 and the Outstanding Academic Leader Project of Shanghai under the grant No.20XD1401700

Due to the success of Bidirectional Encoder Representations from Transformers (BERT) in natural language process (NLP), the multi-head attention transformer has been more and more prevalent in computer-vision researches (CV). However, it still remains a challenge for researchers to put forward complex tasks such as vision detection and semantic segmentation. Although multiple Transformer-Based architectures like DETR and ViT-FRCNN have been proposed to complete object detection task, they inevitably decreases discrimination accuracy and brings down computational efficiency caused by the enormous learning parameters and heavy computational complexity incurred by the traditional self-attention operation. In order to alleviate these issues, we present a novel object detection architecture, named Convolutional vision Transformer Based Attentive Single Shot MultiBox Detector (CvT-ASSD), that built on the top of Convolutional vision Transormer (CvT) with the efficient Attentive Single Shot MultiBox Detector (ASSD). We provide comprehensive empirical evidence showing that our model CvT-ASSD can leads to good system efficiency and performance while being pretrained on large-scale detection datasets such as PASCAL VOC and MS COCO. Code has been released on public github repository at https://github.com/albert-jin/CvT-ASSD.

翻译：由于来自变异器(BERT)的双向编码器在自然语言工艺(NLP)中的成功,多头关注变压器在计算机视觉研究(CV)中越来越普遍。然而,对于研究人员来说,提出视觉探测和语义分割等复杂任务仍然是一项挑战。虽然已提议DTR和VIT-FRCNN等多种以变异器为基础的结构完成目标探测任务,但由于传统自省操作产生的巨大的学习参数和沉重的计算复杂性,它们不可避免地降低差别精确度,降低计算效率。为了缓解这些问题,我们提出了一个新型的物体探测结构,名为“变动视觉变异器”,其名称为“超导式单向多管检测器”(CvT-ASSD),建在“变动图像转换器”顶端,并配有高效的Attentitive 单向多盘检测器(ASSD),我们提供了全面的实证证据,表明我们的CvT-ASSDD模型能够带来良好的系统效率和性工作,同时正在对ASA-CSAL 大规模检测系统数据库进行前的测试。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

图像分类的深度卷积神经网络模型综述

专知会员服务

57+阅读 · 2021年10月29日

【百度】 PP-YOLOv2使用目标检测器

专知会员服务

18+阅读 · 2021年4月24日

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日