In this paper, we tackle the problem of convolutional neural network design. Instead of focusing on the design of the overall architecture, we investigate a design space that is usually overlooked, i.e. adjusting the channel configurations of predefined networks. We find that this adjustment can be achieved by shrinking widened baseline networks and leads to superior performance. Based on that, we articulate the heterogeneity hypothesis: with the same training protocol, there exists a layer-wise differentiated network architecture (LW-DNA) that can outperform the original network with regular channel configurations but with a lower level of model complexity. The LW-DNA models are identified without extra computational cost or training time compared with the original network. This constraint leads to controlled experiments which direct the focus to the importance of layer-wise specific channel configurations. LW-DNA models come with advantages related to overfitting, i.e. the relative relationship between model complexity and dataset size. Experiments are conducted on various networks and datasets for image classification, visual tracking and image restoration. The resultant LW-DNA models consistently outperform the baseline models. Code is available at https://github.com/ofsoundof/Heterogeneity_Hypothesis.
翻译:在本文中,我们处理的是神经神经进化网络设计的问题。我们不是关注整个结构的设计,而是调查通常被忽视的设计空间,即调整预设网络的频道配置。我们发现,这种调整可以通过缩小扩大基线网络来实现,并导致更高的性能。在此基础上,我们阐述了异质假设:根据同样的培训协议,存在着一个层次差异化的网络结构(LW-DNA),它可以以常规频道配置优于原始网络,但模型复杂性较低。发现LW-DNA模型时没有额外的计算成本或培训时间,也没有与原始网络相比的额外计算或培训时间。这一限制导致有控制的实验,将重点引导到分层特定频道配置的重要性。LW-DNA模型具有与超配相关的优势,即模型复杂度和数据集大小之间的相对关系。对各种网络和数据集进行了实验,用于图像分类、视觉跟踪和图像恢复。结果的LW-DNA模型没有额外的计算成本或培训时间,也没有与原始网络相比较。这种限制导致有控制地实验,将焦点转向分层特定频道配置的重要性。LW-DNA模型,可以在 http://hogis/Hecomterity.