The increasing availability of video recordings made by multiple cameras has offered new means for mitigatingocclusion and depth ambiguities in pose and motion reconstruction methods. Yet, multi-view algorithms strongly depend on camera parameters; particularly, the relativepositions between the cameras. Such a dependency becomes a hurdle once shifting to dynamic capture in uncontrolled settings. We introduce FLEX (Free muLti-view rEconstruXion), an end-to-end parameter-free multi-viewmodel. FLEX is parameter-free in the sense that it does not require any camera parameters, neither intrinsic nor extrinsic. Our key idea is that the 3D angles between skeletal parts, as well as bone lengths, are invariant to the camera position. Hence, learning 3D rotations and bone lengths rather than locations allows predicting common values for all camera views. Our network takes multiple video streams, learns fused deep features through a novel multi-view fusion layer, and reconstructs a single consistent skeleton with temporally coherent joint rotations. We demonstrate quantitative and qualitative results on the Human3.6M and KTH Multi-view Football II datasets, and on synthetic multi-person video streams captured by dynamic cameras. We compare our model to state-of-the-art methods that are not parameter-free and show that in the absence of camera parameters, we outperform them by a large margin while obtaining comparable results when camera parameters are available. Code, trained models, video examples, and more material will be available on our project page.
翻译:由多个相机制作的视频记录越来越容易获得,这为减轻形象和运动重建方法中的封闭性和深度模糊性提供了新的手段。然而,多视图算法在很大程度上取决于相机参数,特别是相机之间的相对位置。这种依赖性一旦转移到动态捕获,就成为障碍。我们引入了FLEX(自由 muLti-view REconfruXion),一个端到端无参数的多视图模型。FLEX没有参数,因为它不需要任何相机参数,无论是内在的还是外在的。我们的关键想法是,骨骼部分之间的3D角度与相机位置是不一致的。因此,学习3D旋转和骨头长度,而不是能够预测所有相机视图的共同值。我们网络需要多个视频流,通过新的多视图聚合层学习深层特征,并且重建一个单一的、与时间一致的视频联合旋转页面。我们在人类3.36M和KTHI多视角参数之间,以及骨骼长度之间的3D角度角度角度角度角度角度是变化不易变的。我们所了解的图像模型,而我们所了解的动态的模型将显示的是,我们所拍摄到的动态的多角度的模型的模型将显示的模型,我们所拍摄的模型将显示的动态的大规模的模型将显示的模型,我们所拍摄的模型将显示的大规模的模型将显示的模型,而成为可比较的模型将显示的模型将显示的动态的模型将显示的动态的模型。