Recent advances in deep learning and computer vision offer an excellent opportunity to investigate high-level visual analysis tasks such as human localization and human pose estimation. Although the performance of human localization and human pose estimation has significantly improved in recent reports, they are not perfect and erroneous localization and pose estimation can be expected among video frames. Studies on the integration of these techniques into a generic pipeline that is robust to noise introduced from those errors are still lacking. This paper fills the missing study. We explored and developed two working pipelines that suited the visual-based positioning and pose estimation tasks. Analyses of the proposed pipelines were conducted on a badminton game. We showed that the concept of tracking by detection could work well, and errors in position and pose could be effectively handled by a linear interpolation technique using information from nearby frames. The results showed that the Visual-based Positioning and Pose Estimation could deliver position and pose estimations with good spatial and temporal resolutions.
翻译:最近深层次学习和计算机愿景方面的进展为调查高层次视觉分析任务,如人类定位和人构成估计提供了极好的机会。虽然人类本地化和人构成估计的绩效在最近的报告中有了显著改善,但它们不是完美和错误的本地化,在视频框中可以作出估计。关于将这些技术纳入一种能耐这些错误产生的噪音的通用管道的研究仍然缺乏。本文填补了缺失的研究。我们探索并开发了两种适合视觉定位和构成估计任务的管线。对拟议管道的分析是在羽毛球游戏上进行的。我们表明,通过探测跟踪的概念可以很好地发挥作用,而位置和姿势上的错误可以通过使用附近框架的信息进行线性互换处理。研究结果表明,基于视觉定位和波斯动能能够提供位置,并以良好的空间和时间分辨率作出估计。