Long-term object detection requires the integration of frame-based results over several seconds. For non-deformable objects, long-term detection is often addressed using object detection followed by video tracking. Unfortunately, tracking is inapplicable to objects that undergo dramatic changes in appearance from frame to frame. As a related example, we study hand detection over long video recordings in collaborative learning environments. More specifically, we develop long-term hand detection methods that can deal with partial occlusions and dramatic changes in appearance. Our approach integrates object-detection, followed by time projections, clustering, and small region removal to provide effective hand detection over long videos. The hand detector achieved average precision (AP) of 72% at 0.5 intersection over union (IoU). The detection results were improved to 81% by using our optimized approach for data augmentation. The method runs at 4.7x the real-time with AP of 81% at 0.5 intersection over the union. Our method reduced the number of false-positive hand detections by 80% by improving IoU ratios from 0.2 to 0.5. The overall hand detection system runs at 4x real-time.
翻译:长期物体探测要求将基于框架的结果整合数秒钟。对于不畸形的物体,长期的探测往往使用视频跟踪来进行。不幸的是,跟踪不适用于从框架到框架的外观发生巨大变化的物体。作为相关的例子,我们研究在合作学习环境中对长视频记录进行人工探测。更具体地说,我们开发了长期的手探测方法,可以处理部分隔离和外观的急剧变化。我们的方法将物体探测综合起来,然后是时间预测、集群和小区域清除,以便在长视频上进行有效的手探测。手探测器在0.5个交叉点的交错处(IoU)达到平均精确度72%(AP),检测结果通过我们优化的数据增强方法提高到81%。该方法在0.5个交叉点上与AP(81%)进行实时操作。我们的方法通过将IoU比率从0.2提高到0.5,将假阳性手探测的数量减少了80%。整个手探测系统在4x实时运行。