In this paper, we present a new neural architectural block for the vision domain, named Mixing Regionally and Locally (MRL), developed with the aim of effectively and efficiently mixing the provided input features. We bifurcate the input feature mixing task as mixing at a regional and local scale. To achieve an efficient mix, we exploit the domain-wide receptive field provided by self-attention for regional-scale mixing and convolutional kernels restricted to local scale for local-scale mixing. More specifically, our proposed method mixes regional features associated with local features within a defined region, followed by a local-scale features mix augmented by regional features. Experiments show that this hybridization of self-attention and convolution brings improved capacity, generalization (right inductive bias), and efficiency. Under similar network settings, MRL outperforms or is at par with its counterparts in classification, object detection, and segmentation tasks. We also show that our MRL-based network architecture achieves state-of-the-art performance for H&E histology datasets. We achieved DICE of 0.843, 0.855, and 0.892 for Kumar, CoNSep, and CPM-17 datasets, respectively, while highlighting the versatility offered by the MRL framework by incorporating layers like group convolutions to improve dataset-specific generalization.
翻译:在本文中,我们为视野领域展示了一个新的神经建筑块,名为“混合区域和地方”(MRL),旨在有效和高效地混合提供的投入特征;我们将投入特征分为混合任务,在区域和地方范围内进行混合;为了实现高效混合,我们利用由自我关注、区域规模的混合和富集核心限于地方规模的局部混合而提供的全域可接受领域;更具体地说,我们提议的方法混合了与特定区域内当地特征相关的区域特征,随后又结合了由区域特征增强的当地规模特征。实验表明,自用和共变混合能够提高能力、普遍化(正确的感知偏向偏向)和效率。在类似的网络环境下,MRL在分类、目标探测和分解任务方面超越或与其对应方相当。我们还表明,我们基于MRL的网络结构实现了与指定区域内地方特征相关的区域特征相关的区域特征,随后又形成了一个地方规模的组合。我们实现了由0.843、0.855和交集的DICIC,同时通过C-BS-C-C-C-GL-GL-GL-G-GL-G-G-G-GL-G-G-G-G-G-G-G-G-G-G-G-GL-G-G-GL-G-G-G-GR-G-G-G-G-G-G-G-G-G-G-S-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-G-