As a variant of standard convolution, a dilated convolution can control effective receptive fields and handle large scale variance of objects without introducing additional computational costs. To fully explore the potential of dilated convolution, we proposed a new type of dilated convolution (referred to as inception convolution), where the convolution operations have independent dilation patterns among different axes, channels and layers. To develop a practical method for learning complex inception convolution based on the data, a simple but effective search algorithm, referred to as efficient dilation optimization (EDO), is developed. Based on statistical optimization, the EDO method operates in a low-cost manner and is extremely fast when it is applied on large scale datasets. Empirical results validate that our method achieves consistent performance gains for image recognition, object detection, instance segmentation, human detection, and human pose estimation. For instance, by simply replacing the 3x3 standard convolution in the ResNet-50 backbone with inception convolution, we significantly improve the AP of Faster R-CNN from 36.4% to 39.2% on MS COCO.
翻译:作为标准变化的变种,变形变异可以控制有效的接受字段,处理物体的大规模差异,而不必增加计算成本。为了充分探索变异的可能性,我们提议了一种新的变异(称为初始变异),即变异行动在不同轴、渠道和层次之间具有独立的演化模式。为了根据数据开发一种学习复杂初始变异的实用方法,正在开发一种简单而有效的搜索算法,称为高效变异优化(EDO),根据统计优化,EDO方法以低成本方式运作,在大规模数据集中应用时速度极快。经验结果证实,我们的方法在图像识别、物体探测、实例分解、人类探测和人体姿势估计方面实现了一致的性能收益。例如,只要将ResNet-50主干线中的3x3标准变异变换为初始变变,我们就大大改进了MSCO公司快速R-CNN的AP,从36.4%提高到39.2%。。