每天一分钟,带你读遍机器人顶级会议文章
标题:Ensemble Deep Learning for Skeleton-based Action Recognition using Temporal Sliding LSTM networks
作者:Inwoong Lee, Doyoung Kim, Seoungyoon Kang, Sanghoon Lee
来源:ICCV 2017 ( IEEE International Conference on Computer Vision)
编译:李建禹
审核:陈世浪
欢迎个人转发朋友圈;其他机构或自媒体如需转载,后台留言申请授权
摘要
本文讨论了骨架关节的特征表示和时间动态建模来识别人的动作。传统方法一般使用相对依赖于某些关节的相对坐标系,只对长期依赖性进行建模,而不包括短期和中期依赖关系。本文将骨架转换成到另一个坐标系,代替原始骨架作为输入,以获得对尺度、旋转和平移的鲁棒性,然后从它们中提取显著的运动特征。考虑到具有不同时间步长的LSTM网络能够很好地模拟各种属性,本文新提出了针对骨架的动作识别的时间滑动LSTM(TS- LSTM)网络。所提出的网络由多个部分组成,分别包含短期、中期和长期的TS- LSTM网络。在此网络中,我们利用多个部分之间的平均集合作为特征来捕获各种时间依赖关系。
本文评估了所提出的网络和一些其他的架构,以验证所提出的网络的有效性,并在5个有挑战性的数据集上的其他方法进行比较。实验结果表明,我们的网络模型通过各种时间特征实现了最先进的性能。另外,我们通过可视化多个部分的softmax特征来分析所识别的动作与不同时长的TS-LSTM特征之间的关系。
图1 系统的整体框架
图2 提出的TS-LSTM模块概念图
图3 由短期、中期、长期和姿态TS-LSTM模块组成的整体体系结构
Abstract
This paper addresses the problems of feature representation of skeleton joints and the modeling of temporal dynamics to recognize human actions. Traditional methods generally use relative coordinate systems dependent on some joints, and model only the long-term dependency, while excluding short-term and medium term dependencies. Instead of taking raw skeletons as the input, we transform the skeletons into another coordinate system to obtain the robustness to scale, rotation and translation, and then extract salient motion features from them. Considering that Long Shortterm Memory (LSTM) networks with various time-step sizes can model various attributes well, we propose novel ensemble Temporal Sliding LSTM (TS-LSTM) networks for skeleton-based action recognition. The proposed network is composed of multiple parts containing short-term, mediumterm and long-term TS-LSTM networks, respectively. In our network, we utilize an average ensemble among multiple parts as a final feature to capture various temporal dependencies. We evaluate the proposed networks and the additional other architectures to verify the effectiveness of the proposed networks, and also compare them with several other methods on five challenging datasets. The experimental results demonstrate that our network models achieve the state-of-the-art performance through various temporal features. Additionally, we analyze a relation between the recognized actions and the multi-term TS-LSTM features by visualizing the softmax features of multiple parts.
如果你对本文感兴趣,想要下载完整文章进行阅读,可以关注【泡泡机器人SLAM】公众号(paopaorobot_slam)。
欢迎来到泡泡论坛,这里有大牛为你解答关于SLAM的任何疑惑。
有想问的问题,或者想刷帖回答问题,泡泡论坛欢迎你!
泡泡网站:www.paopaorobot.org
泡泡论坛:http://paopaorobot.org/forums/
泡泡机器人SLAM的原创内容均由泡泡机器人的成员花费大量心血制作而成,希望大家珍惜我们的劳动成果,转载请务必注明出自【泡泡机器人SLAM】微信公众号,否则侵权必究!同时,我们也欢迎各位转载到自己的朋友圈,让更多的人能进入到SLAM这个领域中,让我们共同为推进中国的SLAM事业而努力!
商业合作及转载请联系liufuqiang_robot@hotmail.com