项目名称: 基于多通道深度卷积神经网络的人体行为分析研究
项目编号: No.61502152
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 计算机科学学科
项目作者: 彭小江
作者单位: 衡阳师范学院
项目金额: 20万元
中文摘要: 视频中的人体行为分析具有广泛的应用前景,比如智能视频监控、视频检索和人机交互等。由于行为速度、拍摄视角、复杂背景等问题,这项技术一直是研究难点。目前,人体行为分析主要是基于人工设计的特征进行展开,比如时空兴趣点和密集轨迹特征等。本项目认为,行为分析发展至今,人工设计的特征前景有限,已经很难为行为分析带来突破性进展。深度学习方法直接使用原始信号,通过多层次卷积、局部归并以及有监督的反馈学习,已在相关领域取得了巨大的成功。针对人体行为分析中的深度学习问题,本项目主要研究深度卷积神经网络模型:借鉴视觉感知中表观、运动和深度信息相对独立的理论,提出多通道深度卷积神经网络模型;考虑到人体行为的多样性对该模型多帧训练时的负面影响,提出基于多种CNN特征和动态时间规整(DTW)方法将人体行为数据粗略对齐策略;提出基于该模型的人体行为相似性验证方法;提出基于该模型的人体姿态估计与行为识别统一框架。
中文关键词: 人体行为分析;深度卷积神经网络;视频表达;深度信息;人体姿态估计
英文摘要: Video-based human action analysis has wide range of applications, such as smart video surveillance, content-based video retrieval, human-computer interaction, etc. Due to the acting velocity, viewpoints and complicated backgrouds, human action analysis has been a challenging research topic. Currently, most approaches of human action analysis are mainly besed on hand-craft features, e.g., space-time interesting points and dense trajectory features. Considering the progresses of action analysis, we believe that the performance of hand-craft features is obviously limited and these features are not able to make break-through progress for video-based human action analysis. However, deep learning based methods, using original signals directly by multi-layer convolution, locally pooling and supervised feedback learning, have been largely successful on most related research fields. Focusing on the deep learning approaches of human action analysis, we mainly explore deep convolutional neural networks (DCNN) in videos. Inspiring by the theory of human visual perception that the percetions of appearance, motion and depth are relatively independent, we first propose multi-stream DCNN (MS-DCNN) based human action analysis. Considering the variation of human action would impact the training of multi-frame DCNN model, we present a coarse alignment stretage for human action frames based on multiple CNN features and dynamic time wrapping (DTW). Moreover, we propose to apply our model to action similarity labeling task with our previous work in this field. Finally, we also explore a unified framework for human pose estimation and human action recognition.
英文关键词: human action analysis;deep convolutional neural networks;video representation;depth information;human pose estimation