We propose a novel optimization-based paradigm for 3D human model fitting on images and scans. In contrast to existing approaches that directly regress the parameters of a low-dimensional statistical body model (e.g. SMPL) from input images, we train an ensemble of per-vertex neural fields network. The network predicts, in a distributed manner, the vertex descent direction towards the ground truth, based on neural features extracted at the current vertex projection. At inference, we employ this network, dubbed LVD, within a gradient-descent optimization pipeline until its convergence, which typically occurs in a fraction of a second even when initializing all vertices into a single point. An exhaustive evaluation demonstrates that our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art. LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.
翻译:我们为3D人模型提出了一个新的优化模式,适合图像和扫描。与从输入图像中直接回归低维统计体模型参数(如SMPL)的现有方法相比,我们培训了每垂直神经场网络的集合体。网络以分布方式预测了以当前脊椎投影中提取的神经特征为基础的向地面真理的脊椎下降方向。根据推论,我们使用这个称为LVD的网络,在梯度-梯度优化管道内,直至其趋同,通常在将所有脊椎初始化为单一点时,仅次于第二位。一项详尽的评估表明,我们的方法能够捕捉到有非常不同体形的穿衣人的基本身体,与最新技术相比,取得显著的改善。LVD也适用于3D人和手的匹配模型,为此,我们用更简单、更快的方法展示了SOTA的重大改进。