The size and shape of the receptive field determine how the network aggregates local information and affect the overall performance of a model considerably. Many components in a neural network, such as kernel sizes and strides for convolution and pooling operations, influence the configuration of a receptive field. However, they still rely on hyperparameters, and the receptive fields of existing models result in suboptimal shapes and sizes. Hence, we propose a simple yet effective Dynamically Optimized Pooling operation, referred to as DynOPool, which optimizes the scale factors of feature maps end-to-end by learning the desirable size and shape of its receptive field in each layer. Any kind of resizing modules in a deep neural network can be replaced by the operations with DynOPool at a minimal cost. Also, DynOPool controls the complexity of a model by introducing an additional loss term that constrains computational cost. Our experiments show that the models equipped with the proposed learnable resizing module outperform the baseline networks on multiple datasets in image classification and semantic segmentation.
翻译:接收字段的大小和形状决定了网络如何汇总本地信息并显著影响模型的总体性能。神经网络中的许多组件,如内核大小和进化和集合操作的步调,影响着一个可接收字段的配置。然而,它们仍然依赖超参数,而现有模型的可接收字段则导致不优化的形状和大小。因此,我们提议了一个简单而有效的动态优化组合操作,称为DynOPool,它通过学习每个层中可接收字段的可取大小和形状,优化地貌图端至端的大小因子。一个深层神经网络中的任何重塑模块都可以以最低的成本被DynOPool的操作所取代。此外,DynOPool通过引入一个限制计算成本的额外损失术语来控制模型的复杂性。我们的实验表明,配备了拟议可学习重现模块的模型的模型超越了图像分类和语系分割中多个数据集的基线网络。