Estimating 3D hand meshes from RGB images robustly is a highly desirable task, made challenging due to the numerous degrees of freedom, and issues such as self similarity and occlusions. Previous methods generally either use parametric 3D hand models or follow a model-free approach. While the former can be considered more robust, e.g. to occlusions, they are less expressive. We propose a hybrid approach, utilizing a deep neural network and differential rendering based optimization to demonstrably achieve the best of both worlds. In addition, we explore Virtual Reality (VR) as an application. Most VR headsets are nowadays equipped with multiple cameras, which we can leverage by extending our method to the egocentric stereo domain. This extension proves to be more resilient to the above mentioned issues. Finally, as a use-case, we show that the improved image-model alignment can be used to acquire the user's hand texture, which leads to a more realistic virtual hand representation.
翻译:从 RGB 图像中强烈地估算 3D 手模模版是一个非常可取的任务,由于自由程度不同以及自我相似和隔离等问题,因此具有挑战性。 以往的方法通常使用参数 3D 手模, 或采用不使用模型的方法。 虽然前者可以被视为较强, 例如隐蔽, 但其表达性不那么强。 我们建议采用混合方法, 利用深层神经网络和基于差异的优化, 以明显地达到两个世界的最佳效果。 此外, 我们探索虚拟现实( VR) 作为一种应用程序。 多数 VR 头目现在都配备了多部相机, 我们可以通过将我们的方法推广到以自我为中心的立体区域来加以利用。 这一扩展证明更能适应上述问题。 最后, 作为使用案例, 我们表明改进的图像模型调整可以用来获取用户的手纹, 从而导致更现实的虚拟手模。