The convolution operation is a central building block of neural network architectures widely used in computer vision. The size of the convolution kernels determines both the expressiveness of convolutional neural networks (CNN), as well as the number of learnable parameters. Increasing the network capacity to capture rich pixel relationships requires increasing the number of learnable parameters, often leading to overfitting and/or lack of robustness. In this paper, we propose a powerful novel building block, the hyper-convolution, which implicitly represents the convolution kernel as a function of kernel coordinates. Hyper-convolutions enable decoupling the kernel size, and hence its receptive field, from the number of learnable parameters. In our experiments, focused on challenging biomedical image segmentation tasks, we demonstrate that replacing regular convolutions with hyper-convolutions leads to more efficient architectures that achieve improved accuracy. Our analysis also shows that learned hyper-convolutions are naturally regularized, which can offer better generalization performance. We believe that hyper-convolutions can be a powerful building block in future neural network architectures solving computer vision tasks.
翻译:共变操作是计算机视觉中广泛使用的神经网络结构的核心构件。 共变内核的大小决定着共变神经网络(CNN)的清晰度,以及可学习参数的数量。 提高网络能力以捕捉丰富的像素关系,需要增加可学习参数的数量,这往往导致过度装配和/或缺乏强力。 在本文中,我们提出一个强大的新构件,即超演动,它隐含着以内核坐标功能来代表共变内核的内核。 超演能够将内核的大小及其可接受字段与可学习参数的数量脱钩。 在我们的实验中,我们专注于挑战生物医学图像分割的任务,我们证明用超演化来取代常规演动,能够提高准确性。 我们的分析还表明,所学的超演动自然规律化,可以提供更好的概括性表现。 我们相信,超演动可以成为未来神经网络结构中解决计算机视觉任务的强大建筑块。