We introduce a novel and generic convolutional unit, DiCE unit, that is built using dimension-wise convolutions and dimension-wise fusion. The dimension-wise convolutions apply light-weight convolutional filtering across each dimension of the input tensor while dimension-wise fusion efficiently combines these dimension-wise representations; allowing the DiCE unit to efficiently encode spatial and channel-wise information contained in the input tensor. The DiCE unit is simple and can be easily plugged into any architecture to improve its efficiency and performance. Compared to depth-wise separable convolutions, the DiCE unit shows significant improvements across different architectures. When DiCE units are stacked to build the DiCENet model, we observe significant improvements over state-of-the-art models across various computer vision tasks including image classification, object detection, and semantic segmentation. On the ImageNet dataset, the DiCENet delivers either the same or better performance than existing models with fewer floating-point operations (FLOPs). Notably, for a network size of about 70 MFLOPs, DiCENet outperforms the state-of-the-art neural search architecture, MNASNet, by 4% on the ImageNet dataset. Our code is open source and available at \url{https://github.com/sacmehta/EdgeNets}
翻译:我们引入了一个新颖和通用的革命单元DICE 单元, 这个单元是使用维维- 维- 维- 维- 维- 维- 融合构建的。 维- 维- 维- 融合将轻量的革命过滤器应用于输入强度的每个层面, 而维- 维- 融合将这些维- 维- 维- 融合有效地结合这些维- 度- 表达式; 允许 DICE 单位有效地编码输入振幅中包含的空间和频道- 信息。 DICE 单元简单, 并且可以很容易地插入任何结构, 以提高其效率和性能。 与深度的分解相较, DICE 单元显示不同结构之间的重大改进。 当 DICE 单位堆叠用于构建 DICENet 模型时, 我们观察到了各种计算机视野任务中最先进的模型的重大改进, 包括图像分类、 对象检测和语系- 语系- 数据网络 4 的系统- 数据源- 系统 的搜索系统 。