In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters
翻译:在本文中,我们描述了一个新的移动结构,即移动网络V2,它改进了移动模型在多重任务和基准以及不同模型大小范围内的先进性能。我们还描述了在我们称为SSDLite的新框架内应用这些移动模型进行物体探测的高效方法。此外,我们展示了如何通过一种我们称为Mmove DeepLabv3的缩小形式的深Labv3 来建立移动语义分解模型。移动网络V2 结构基于一种倒置的剩余结构,即残余块的输入和输出与传统的残余模型相对的细瓶颈层,后者在输入中采用扩大的表达方式。移动网络2使用轻量级深度连接来过滤中间扩展层的特性。此外,我们发现,为了保持代表性,必须消除狭小层的非线性。我们证明,这提高了性能,并提供了导致这一设计的一种直觉。最后,我们的方法使得输入/输出域与变异的表达式层相对齐,为进一步分析提供了方便的图像分析框架。我们测量了我们的图像网络性能分类,通过测量数字来测量。