Recently, vision transformers have shown great success in a set of human reconstruction tasks such as 2D human pose estimation (2D HPE), 3D human pose estimation (3D HPE), and human mesh reconstruction (HMR) tasks. In these tasks, feature map representations of the human structural information are often extracted first from the image by a CNN (such as HRNet), and then further processed by transformer to predict the heatmaps (encodes each joint's location into a feature map with a Gaussian distribution) for HPE or HMR. However, existing transformer architectures are not able to process these feature map inputs directly, forcing an unnatural flattening of the location-sensitive human structural information. Furthermore, much of the performance benefit in recent HPE and HMR methods has come at the cost of ever-increasing computation and memory needs. Therefore, to simultaneously address these problems, we propose FeatER, a novel transformer design that preserves the inherent structure of feature map representations when modeling attention while reducing memory and computational costs. Taking advantage of FeatER, we build an efficient network for a set of human reconstruction tasks including 2D HPE, 3D HPE, and HMR. A feature map reconstruction module is applied to improve the performance of the estimated human pose and mesh. Extensive experiments demonstrate the effectiveness of FeatER on various human pose and mesh datasets. For instance, FeatER outperforms the SOTA method MeshGraphormer by requiring 5% of Params and 16% of MACs on Human3.6M and 3DPW datasets. Code is available at https://github.com/zczcwh/FeatER.
翻译:最近,视觉变异器在一系列人类重建任务中表现出巨大成功,如2D人造面估计(2D HPE),3D人造面估计(3D HPE),以及人类网状重建(HMR)等任务。在这些任务中,人类结构信息的特征示意图往往首先由CNN(如HRNet)从图像中提取,然后由变异器进一步处理,以预测HPE或HMR的热测图(将每个联合点编码成一个配有Gaussian分布的地貌图)。然而,现有的变异器结构无法直接处理这些地貌地图输入,迫使对地点敏感的人类结构信息进行异常的平整。此外,最近HPE和HMR方法中的许多性能收益是以不断增长的计算和记忆需求为代价的。因此,为了同时解决这些问题,我们建议Feater(eterer)设计一个在模拟关注和计算成本时保存地貌图显示的固有结构结构。利用Featerer(Faterer)的优势,我们为需要重建一套人造图的网络,包括2D HPEMEMA、3MS、HPE、HPS、HPM、HPS、HPS、5Ms、HS、Fs、Fs、Fs、Fs、S、S、S、Fs、Fs、Fs、S、S、S、F、Fs、Fss、S、S、S、S、S、S、S、Fs、Fs、Fs、Fs、Fs、Slormas、Sld、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、SD、SD、SD、SD、SD、SD、SD、SD、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、S、SD、S、S、SD、S、S、SD、SD、SD、