3D 人类粒子估计利用3D 3D 变形变形器开发时空环境 (Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation) - 专知论文

会员服务 ·

0

步幅 · 可约的 · 估计/估计量 · INFORMS · 变换 ·

2022 年 1 月 11 日

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation

翻译：3D 人类粒子估计利用3D 3D 变形变形器开发时空环境

Wenhao Li,Hong Liu,Runwei Ding,Mengyuan Liu,Pichao Wang,Wenming Yang

from arxiv, Accepted by IEEE Transactions on Multimedia. Open sourced

Despite the great progress in 3D human pose estimation from videos, it is still an open problem to take full advantage of a redundant 2D pose sequence to learn representative representations for generating one 3D pose. To this end, we propose an improved Transformer-based architecture, called Strided Transformer, which simply and effectively lifts a long sequence of 2D joint locations to a single 3D pose. Specifically, a Vanilla Transformer Encoder (VTE) is adopted to model long-range dependencies of 2D pose sequences. To reduce the redundancy of the sequence, fully-connected layers in the feed-forward network of VTE are replaced with strided convolutions to progressively shrink the sequence length and aggregate information from local contexts. The modified VTE is termed as Strided Transformer Encoder (STE), which is built upon the outputs of VTE. STE not only effectively aggregates long-range information to a single-vector representation in a hierarchical global and local fashion, but also significantly reduces the computation cost. Furthermore, a full-to-single supervision scheme is designed at both full sequence and single target frame scales applied to the outputs of VTE and STE, respectively. This scheme imposes extra temporal smoothness constraints in conjunction with the single target frame supervision and hence helps produce smoother and more accurate 3D poses. The proposed Strided Transformer is evaluated on two challenging benchmark datasets, Human3.6M and HumanEva-I, and achieves state-of-the-art results with fewer parameters. Code and models are available at \url{https://github.com/Vegetebird/StridedTransformer-Pose3D}.

翻译：尽管通过视频在3D人造图像估算方面取得了巨大进展,但充分利用冗余的 2D 配置序列仍是一个尚未解决的问题。为此,我们提议改进基于变压器的架构,称为Strided 变压器,简单而有效地将2D 组合位置的长序提升为1 3D 组合。具体地说,采用Vanilla 变压器 Encoder(VTE),以模拟2D 构成序列的远程依赖性。为了减少序列的冗余,VTE 供料前网络中完全连接的层被替换为Straded Convolutions,以逐步缩短序列长度和从当地背景获得的总体信息。修改后的变压器被称为Stridad 变压器 Eccoder(STE),以VTE的输出为基础,不仅有效地将长程信息汇总到拟议的全球和地方等级的单个矢量代表制,而且大幅降低计算成本。此外,全到全向D的调控管系统系统,在全序和单一目标框架下分别对ST-deleal-deal 和Stardeal-deal Flax 做了精确的调整。

0

相关内容

【MM 2021】基于自监督区域和时序辅助任务的面部运动单元识别，Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition

【MM 2021】基于自监督区域和时序辅助任务的面部运动单元识别，Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition

专知会员服务

4+阅读 · 2022年3月22日

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

专知会员服务

25+阅读 · 2022年3月9日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

概率和平均框架下一系列Sobolev空间中的函数逼近与恢复

国家自然科学基金

1+阅读 · 2015年12月31日

扬子鳄环境适应的MHC多样性

国家自然科学基金

0+阅读 · 2014年12月31日

基于特征的大场景地面Lidar点云配准

国家自然科学基金

1+阅读 · 2013年12月31日

嵌入性视角下联盟组合多样性与企业绩效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

白血病多层次转录调控组学数据的信息整合与可视化挖掘

国家自然科学基金

1+阅读 · 2013年12月31日

Co基过渡金属合金团簇的结构和磁性理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

北冰洋产多糖细菌多样性及多糖特性

国家自然科学基金

0+阅读 · 2013年12月31日

赋值理论与几何不等式的研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于2D视频视觉关注度的3D重建方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

三维片上网络（3D NoC）关键技术研究

国家自然科学基金

1+阅读 · 2008年12月31日

GIMO: Gaze-Informed Human Motion Prediction in Context

Arxiv

1+阅读 · 2022年4月20日

Visual-based Positioning and Pose Estimation

Arxiv

0+阅读 · 2022年4月20日

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation

Arxiv

0+阅读 · 2022年4月19日

Multimodal Token Fusion for Vision Transformers

Arxiv

3+阅读 · 2022年4月19日

Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation

Arxiv

0+阅读 · 2022年4月19日

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Arxiv

0+阅读 · 2022年4月17日

TubeR: Tubelet Transformer for Video Action Detection

Arxiv

0+阅读 · 2022年4月15日

2D Human Pose Estimation: A Survey

2D Human Pose Estimation: A Survey

Arxiv

0+阅读 · 2022年4月15日

Deep Learning-Based Human Pose Estimation: A Survey

Arxiv

27+阅读 · 2020年12月24日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【MM 2021】基于自监督区域和时序辅助任务的面部运动单元识别，Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition

【MM 2021】基于自监督区域和时序辅助任务的面部运动单元识别，Self-Supervised Regional and Temporal Auxiliary Tasks for Facial Action Unit Recognition

专知会员服务

4+阅读 · 2022年3月22日

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

CVPR2022 | Sparse Transformer刷新点云目标检测的SOTA

专知会员服务

25+阅读 · 2022年3月9日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

GIMO: Gaze-Informed Human Motion Prediction in Context

Arxiv

1+阅读 · 2022年4月20日

Visual-based Positioning and Pose Estimation

Arxiv

0+阅读 · 2022年4月20日

Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose Estimation

Arxiv

0+阅读 · 2022年4月19日

Multimodal Token Fusion for Vision Transformers

Arxiv

3+阅读 · 2022年4月19日

Accurate 3D Hand Pose Estimation for Whole-Body 3D Human Mesh Estimation

Arxiv

0+阅读 · 2022年4月19日

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Arxiv

0+阅读 · 2022年4月17日

TubeR: Tubelet Transformer for Video Action Detection

Arxiv

0+阅读 · 2022年4月15日

2D Human Pose Estimation: A Survey

2D Human Pose Estimation: A Survey

Arxiv

0+阅读 · 2022年4月15日

Deep Learning-Based Human Pose Estimation: A Survey

Arxiv

27+阅读 · 2020年12月24日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

相关基金

概率和平均框架下一系列Sobolev空间中的函数逼近与恢复

国家自然科学基金

1+阅读 · 2015年12月31日

扬子鳄环境适应的MHC多样性

国家自然科学基金

0+阅读 · 2014年12月31日

基于特征的大场景地面Lidar点云配准

国家自然科学基金

1+阅读 · 2013年12月31日

嵌入性视角下联盟组合多样性与企业绩效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

白血病多层次转录调控组学数据的信息整合与可视化挖掘

国家自然科学基金

1+阅读 · 2013年12月31日

Co基过渡金属合金团簇的结构和磁性理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

北冰洋产多糖细菌多样性及多糖特性

国家自然科学基金

0+阅读 · 2013年12月31日

赋值理论与几何不等式的研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于2D视频视觉关注度的3D重建方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

三维片上网络（3D NoC）关键技术研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员