Recently, Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification problems. In this paper, we study NAS for semantic image segmentation, an important computer vision task that assigns a semantic label to every pixel in an image. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Without any ImageNet pretraining, our architecture searched specifically for semantic image segmentation attains state-of-the-art performance.
翻译:最近,神经结构搜索(NAS)成功地确定了超人设计的大规模图像分类问题的神经网络结构。 在本文中,我们研究NAS 的语义图像分割,这是一项重要的计算机视觉任务,为图像中的每个像素指定一个语义标签。 现有的工作往往侧重于搜索重复的细胞结构,同时手工设计控制空间分辨率变化的外部网络结构。 这种选择简化了搜索空间,但对于密集图像预测却越来越成问题,而这种图像预测显示网络层次的建筑变化要大得多。 因此,我们提议在构成等级结构搜索空间的细胞层次结构之外再搜索网络级别结构。 我们提出了一个网络水平搜索空间,包括许多流行设计,并开发一种能够高效地进行基于梯度的建筑搜索的配方(3 P100 GPU日,城市景象上)。 我们展示了在具有挑战性的城市景象上的拟议方法的有效性, PASCAL VOC 2012 和 ADE20K 数据集。 在没有任何图像网络前训练的情况下,我们专门搜索的图像网络结构可以达到状态。