Image- and video-based 3D human recovery (i.e., pose and shape estimation) have achieved substantial progress. However, due to the prohibitive cost of motion capture, existing datasets are often limited in scale and diversity. In this work, we obtain massive human sequences by playing the video game with automatically annotated 3D ground truths. Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine, featuring a highly diverse set of subjects, actions, and scenarios. More importantly, we study the use of game-playing data and obtain five major insights. First, game-playing data is surprisingly effective. A simple frame-based baseline trained on GTA-Human outperforms more sophisticated methods by a large margin. For video-based methods, GTA-Human is even on par with the in-domain training set. Second, we discover that synthetic data provides critical complements to the real data that is typically collected indoor. Our investigation into domain gap provides explanations for our data mixture strategies that are simple yet useful. Third, the scale of the dataset matters. The performance boost is closely related to the additional data available. A systematic study reveals the model sensitivity to data density from multiple key aspects. Fourth, the effectiveness of GTA-Human is also attributed to the rich collection of strong supervision labels (SMPL parameters), which are otherwise expensive to acquire in real datasets. Fifth, the benefits of synthetic data extend to larger models such as deeper convolutional neural networks (CNNs) and Transformers, for which a significant impact is also observed. We hope our work could pave the way for scaling up 3D human recovery to the real world. Homepage: https://caizhongang.github.io/projects/GTA-Human/
翻译:以图像和视频为基础的 3D 人类恢复( 即, 形状和形状估计) 已经取得了巨大的进展。 但是, 由于运动捕获成本过高, 现有的数据集在规模和多样性上往往有限。 在这项工作中, 我们通过自动附加3D 地面真理玩视频游戏, 获得了大量的人类序列。 具体地说, 我们贡献了GTA- Hulian, 这是由 GTA- V 游戏引擎生成的大型 3D 人类数据集, 其主题、 行动和情景各异。 更重要的是, 我们研究游戏游戏数据的使用, 并获得五大层次的洞察数据。 首先, 游戏数据显示的网络效果惊人。 在GTA- HHR 外观上, 一个简单的基于框架的基线, 更复杂的方法。 对于基于视频的方法, GTA- HHR 的功能甚至与内部的训练相近。 其次, 我们发现合成数据为通常在室内收集的真实数据提供了重要的补充。 我们对域域间差异的调查为我们的数据混合物战略提供了解释, 简单而有用的解释。 第三,, 更深的模型显示, 更深层次的, 更深层次的 数据监督 的尺度是,, 更深层次的 数据变的尺度是, 另一个的 将数据 将数据质量的 将数据 将数据