In this paper, we revisit the problem of 3D human modeling from two orthogonal silhouettes of individuals (i.e., front and side views). Different from our prior work, a supervised learning approach based on convolutional neural network (CNN) is investigated to solve the problem by establishing a mapping function that can effectively extract features from two silhouettes and fuse them into coefficients in the shape space of human bodies. A new CNN structure is proposed in our work to exact not only the discriminative features of front and side views and also their mixed features for the mapping function. 3D human models with high accuracy are synthesized from coefficients generated by the mapping function. Existing CNN approaches for 3D human modeling usually learn a large number of parameters (from 8.5M to 355.4M) from two binary images. Differently, we investigate a new network architecture and conduct the samples on silhouettes as input. As a consequence, more accurate models can be generated by our network with only 2.4M coefficients. The training of our network is conducted on samples obtained by augmenting a publicly accessible dataset. Learning transfer by using datasets with a smaller number of scanned models is applied to our network to enable the function of generating results with gender-oriented (or geographical) patterns.
翻译:在本文中,我们重新审视了3D人类模型问题,它来自两种正方形个人侧面观点(即正面和侧面观点)。与我们先前的工作不同,我们调查了一种基于卷发神经网络(CNN)的监督下学习方法,以便通过建立能够有效提取两个双面神经网络特征并将其结合到人体形状空间中的系数的映射功能来解决问题。我们在工作中建议一个新的CNN结构,不仅要精确地标出正面和侧面观点的区别特征,而且要对绘图功能进行混合特征。3D高精确度的人类模型由绘图功能生成的系数合成。现有的3D人类模型的CNN方法通常从两个二进制图像中学习大量参数(从8.5M到355.4M)。不同的是,我们调查一个新的网络结构,并将硅面空间的样本作为输入。因此,我们的网络只能产生更准确的模型,只有2.4M系数。我们的网络培训是在通过增加可公开获取的数据集而获得的样本上进行。现有的3D人类模型通常从两个二进式图像中学习大量参数(从8.M到35.4M),用以性别为导向的网络进行扫描。