Existing lifting networks for regressing 3D human poses from 2D single-view poses are typically constructed with linear layers based on graph-structured representation learning. In sharp contrast to them, this paper presents Grid Convolution (GridConv), mimicking the wisdom of regular convolution operations in image space. GridConv is based on a novel Semantic Grid Transformation (SGT) which leverages a binary assignment matrix to map the irregular graph-structured human pose onto a regular weave-like grid pose representation joint by joint, enabling layer-wise feature learning with GridConv operations. We provide two ways to implement SGT, including handcrafted and learnable designs. Surprisingly, both designs turn out to achieve promising results and the learnable one is better, demonstrating the great potential of this new lifting representation learning formulation. To improve the ability of GridConv to encode contextual cues, we introduce an attention module over the convolutional kernel, making grid convolution operations input-dependent, spatial-aware and grid-specific. We show that our fully convolutional grid lifting network outperforms state-of-the-art methods with noticeable margins under (1) conventional evaluation on Human3.6M and (2) cross-evaluation on MPI-INF-3DHP. Code is available at https://github.com/OSVAI/GridConv
翻译:从 2D 单视面外观外观后退 3D 人形的现有提升网络通常以基于图表结构化学习的直线层构建。 与此形成鲜明对照的是,本文件展示了Grid Convolution(Grid Convon),模仿了图像空间常规演化作业的智慧。 Grid Conv 是基于一个新颖的Semantic Grive Transform (SGT) (SGT),利用二进制外观外观矩阵将非正统图形结构化的人的外形映射成一个像常规编织的网格,通过与Gridge Conv 操作的联合、使多层化特性学习。 我们提供了两种实施 SGT 的方法, 包括手制和可学习的设计。 令人惊讶的是, 两者的设计都是为了实现有希望的结果, 而可学习的版本更好, 展示了这种新的提升代表面面面面图的模型配置的巨大潜力。 为了提高Greg Conven Conven Conven Con 将人形变电动操作与MAGMM 3. 和MFI 的跨边评估。