We present an approach to generating 3D human models from images. The key to our framework is that we predict double-sided orthographic depth maps and color images from a single perspective projected image. Our framework consists of three networks. The first network predicts normal maps to recover geometric details such as wrinkles in the clothes and facial regions. The second network predicts shade-removed images for the front and back views by utilizing the predicted normal maps. The last multi-headed network takes both normal maps and shade-free images and predicts depth maps while selectively fusing photometric and geometric information through multi-headed attention gates. Experimental results demonstrate that our method shows visually plausible results and competitive performance in terms of various evaluation metrics over state-of-the-art methods.
翻译:我们展示了从图像中生成 3D 人类模型的方法。 我们框架的关键是, 我们从单一视角预测的图像中预测双面正方形深度地图和彩色图像。 我们的框架由三个网络组成。 第一个网络预测了正常的几何地图, 以恢复衣着和面部区域的皱纹等几何细节。 第二个网络通过使用预测的正常地图预测了前面和后面的阴影式图像。 最后一个多头网络使用普通地图和无阴暗图像,并预测深度地图,同时有选择地通过多头关注门使用光度和几何光度信息。 实验结果显示,我们的方法在对最新方法的各种评估指标方面显示了可见的可信结果和竞争性表现。