MVSFormer:通过变形器和温度深度学习强力图像演示多视立体 (MVSFormer: Learning Robust Image Representations via Transformers and Temperature-based Depth for Multi-View Stereo)

Feature representation learning is the key recipe for learning-based Multi-View Stereo (MVS). As the common feature extractor of learning-based MVS, vanilla Feature Pyramid Networks (FPN) suffers from discouraged feature representations for reflection and texture-less areas, which limits the generalization of MVS. Even FPNs worked with pre-trained Convolutional Neural Networks (CNNs) fail to tackle these issues. On the other hand, Vision Transformers (ViTs) have achieved prominent success in many 2D vision tasks. Thus we ask whether ViTs can facilitate the feature learning in MVS? In this paper, we propose a pre-trained ViT enhanced MVS network called MVSFormer, which can learn more reliable feature representations benefited by informative priors from ViT. Then MVSFormer-P and MVSFormer-H are further proposed with fixed ViT weights and trainable ones respectively. MVSFormer-P is more efficient while MVSFormer-H can achieve superior performance. To make ViTs robust to arbitrary resolutions for MVS tasks, we propose to use an efficient multi-scale training with gradient accumulation. Moreover, we discuss the merits and drawbacks of classification and regression-based MVS methods, and further propose to unify them with a temperature-based strategy. MVSFormer achieves state-of-the-art performance on the DTU dataset. Particularly, our anonymous submission of MVSFormer is ranked in the Top-1 position on both intermediate and advanced sets of the highly competitive Tanks-and-Temples leaderboard on the day of submission compared with other published works. Codes and models will be released.

翻译：以学习为基础的多视立体(MVS)的特征学习是学习性能多视立体学的关键路由。由于学习性的MVS、香草性地貌金字网(FPN)的共同特征提取器是学习性的MVS、香草性地貌金字网(FPN)的共同特征提取器(FPN)在反射和无纹状区域有令人气馁的特征展示,这限制了MVS的普及性。即使是FPNS也未能解决这些问题。另一方面,愿景变形器(VT)在许多2D愿景任务中取得了显著的成功。因此,我们问VVTS能否促进MVS的特征学习?在本文件中,我们提议一个事先经过训练的VIT高级MVS网络(MVFormer-P和MVS-H)的强化性能展示器。然后,MVS-Former-P(VT)将进一步提出固定VT重量和可训练的模型。MVS-Former-P将更有效率,而MS-H能-H能取得更优的性性性性表现。我们提议了VTS的升级的升级的进度定位和不断升级的升级的升级的进度,让我们的升级的升级的交付和升级的升级的进度,让我们的升级的升级的进度和升级的进度,我们的进度,我们的进度的升级的升级的升级的升级的升级的进度,我们的升级的升级的进度,我们的进度,要在高分级的变制的变制的变制的变制。