High resolution and advanced semantic representation are both vital for dense prediction. Empirically, low-resolution feature maps often achieve stronger semantic representation, and high-resolution feature maps generally can better identify local features such as edges, but contains weaker semantic information. Existing state-of-the-art frameworks such as HRNet has kept low-resolution and high-resolution feature maps in parallel, and repeatedly exchange the information across different resolutions. However, we believe that the lowest-resolution feature map often contains the strongest semantic information, and it is necessary to go through more layers to merge with high-resolution feature maps, while for high-resolution feature maps, the computational cost of each convolutional layer is very large, and there is no need to go through so many layers. Therefore, we designed a U-shaped High-Resolution Network (U-HRNet), which adds more stages after the feature map with strongest semantic representation and relaxes the constraint in HRNet that all resolutions need to be calculated parallel for a newly added stage. More calculations are allocated to low-resolution feature maps, which significantly improves the overall semantic representation. U-HRNet is a substitute for the HRNet backbone and can achieve significant improvement on multiple semantic segmentation and depth prediction datasets, under the exactly same training and inference setting, with almost no increasing in the amount of calculation. Code is available at PaddleSeg: https://github.com/PaddlePaddle/PaddleSeg.
翻译:高分辨率和高级语义表示法对于密集的预测来说至关重要。 随机地、低分辨率特征地图往往能产生更强的语义表示法,而高分辨率特征地图一般可以更好地识别边缘等本地特征,但含有较弱的语义信息。 现有最先进的框架,如 HRNet, 保持了低分辨率和高分辨率特征地图平行, 并反复在不同决议之间交流信息。 然而, 我们认为, 最低分辨率特征地图往往包含最强的语义信息, 并且有必要通过更多层与高分辨率特征地图合并, 而对于高分辨率特征地图来说, 高分辨率特征地图, 高分辨率特征地图的计算成本非常高, 无需通过如此多的层次信息。 因此, 我们设计了一个U型高分辨率网络(U-HRNet), 在地貌地图之后又增加了更多的阶段, 而HRPA 地图需要与新添加的阶段同步计算。 更多的计算方法被分配到低分辨率地段地图, 大大改进了总体的语义/高分辨率表示法 。 U-HRNet 的计算法可以完全地段 。