Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose position aware circular convolution (ParC), a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the ParCs and squeeze-exictation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in ConvNets or transformers. Experiment results show that the proposed ParC-Net achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed. For classification on ImageNet-1k, ParC-Net achieves 78.6% top-1 accuracy with about 5.0 million parameters, saving 11% parameters and 13% computational cost but gaining 0.2% higher accuracy and 23% faster inference speed (on ARM based Rockchip RK3288) compared with MobileViT, and uses only 0.5 times parameters but gaining 2.7% accuracy compared with DeIT. On MS-COCO object detection and PASCAL VOC segmentation tasks, ParC-Net also shows better performance. Source code is available at https://github.com/hkzhang91/ParC-Net
翻译:最近,视觉变压器开始显示令人印象深刻的结果,这些结果大大超过大型以变速为基础的模型。然而,在移动或资源受限设备小型模型领域,ConvNet在性能和模型复杂性方面仍有其自身的优势。我们提议ParC-Net,一个纯ConvNet基础主干模型,通过将视觉变压器的优点转化为ConvNet或变压器来进一步加强这些优势。具体地说,我们提议定位意识到循环变压(ParC),一个轻量级变压游戏,它拥有一个全球可接收的字段,同时生成了对参数敏感的参数。我们结合了ParCs和挤压ex-exication操作,形成了一个类似于模型的元前方,这个模型还拥有变压器等关注机制。上述块可以用插接式方式来取代ConvveyNet或变压器中的相关块。实验结果显示,拟议的ParC-Net比普通的轻度变速C网络和基于视觉变压的模型的性能更好,同时减少了参数和更快的参数和速度。在图像变速率上比较了OVIC-I-IL-ILC的精确值的精确值为23C。比较了Pal-ROC的精确值的精确值为23C的精确度。