We propose a new network architecture, the Fractal Pyramid Networks (PFNs) for pixel-wise prediction tasks as an alternative to the widely used encoder-decoder structure. In the encoder-decoder structure, the input is processed by an encoding-decoding pipeline that tries to get a semantic large-channel feature. Different from that, our proposed PFNs hold multiple information processing pathways and encode the information to multiple separate small-channel features. On the task of self-supervised monocular depth estimation, even without ImageNet pretrained, our models can compete or outperform the state-of-the-art methods on the KITTI dataset with much fewer parameters. Moreover, the visual quality of the prediction is significantly improved. The experiment of semantic segmentation provides evidence that the PFNs can be applied to other pixel-wise prediction tasks, and demonstrates that our models can catch more global structure information.
翻译:我们建议一个新的网络结构,即用于像素预测任务的分形金字塔网络(PFNs),以替代广泛使用的编码器解码器结构。在编码器解码器结构中,输入由编码解码管道处理,该管道试图获得一个语义大通道特性。与此不同的是,我们提议的PFMs持有多种信息处理路径,并将信息编码为多个不同的小通道特性。关于自我监督的单眼深度估计任务,即使没有图像网预先培训,我们的模型也可以以更少的参数竞争或超越KITTI数据集上的最新方法。此外,该预测的视觉质量正在大大改进。语义分割实验提供了证据,表明PFMs可以应用于其他像素-智慧预测任务,并表明我们的模型可以捕捉更多的全球结构信息。