Semantic segmentation for scene understanding is nowadays widely demanded, raising significant challenges for the algorithm efficiency, especially its applications on resource-limited platforms. Current segmentation models are trained and evaluated on massive high-resolution scene images ("data level") and suffer from the expensive computation arising from the required multi-scale aggregation("network level"). In both folds, the computational and energy costs in training and inference are notable due to the often desired large input resolutions and heavy computational burden of segmentation models. To this end, we propose DANCE, general automated DAta-Network Co-optimization for Efficient segmentation model training and inference. Distinct from existing efficient segmentation approaches that focus merely on light-weight network design, DANCE distinguishes itself as an automated simultaneous data-network co-optimization via both input data manipulation and network architecture slimming. Specifically, DANCE integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images' spatial complexity. Such a downsampling operation, in addition to slimming down the cost associated with the input size directly, also shrinks the dynamic range of input object and context scales, therefore motivating us to also adaptively slim the network to match the downsampled data. Extensive experiments and ablating studies (on four SOTA segmentation models with three popular segmentation datasets under two training settings) demonstrate that DANCE can achieve "all-win" towards efficient segmentation(reduced training cost, less expensive inference, and better mean Intersection-over-Union (mIoU)).
翻译:用于现场理解的语义分解目前被广泛要求,这给算法效率带来了重大挑战,特别是在资源有限的平台上的应用。当前的分解模型在大规模高分辨率现场图像(“数据水平”)上接受培训和评价,并且由于需要的多尺度聚合(“网络水平”)而承受昂贵的计算。在两个折叠中,培训和推理中的计算成本和能源成本都值得注意,因为通常需要大量输入分辨率和对分解模型的沉重计算负担。为此,我们提议丹斯,通用自动Data-Network 共同优化用于高效分解模型培训和推断。与仅侧重于轻量网络设计的现有高效分解方法不同,丹斯通过输入数据操作和网络结构的精细化,将自己作为自动同步的数据同步化。 丹斯将自动的微薄数据整合,适应性下调/下降输入图像并控制其根据图像的空间复杂性对培训损失的相应贡献。 下调的下调操作方法不同于仅侧重于轻量网络网络设计设计的高效分解法, 将数据分解到动态的分层中, 直缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩图。