In this paper we present a three-stream algorithm for real-time action recognition and a new dataset of handwash videos, with the intent of aligning action recognition with real-world constraints to yield effective conclusions. A three-stream fusion algorithm is proposed, which runs both accurately and efficiently, in real-time even on low-powered systems such as a Raspberry Pi. The cornerstone of the proposed algorithm is the incorporation of both spatial and temporal information, as well as the information of the objects in a video while using an efficient architecture, and Optical Flow computation to achieve commendable results in real-time. The results achieved by this algorithm are benchmarked on the UCF-101 as well as the HMDB-51 datasets, achieving an accuracy of 92.7% and 64.9% respectively. An important point to note is that the algorithm is novel in the aspect that it is also able to learn the intricate differences between extremely similar actions, which would be difficult even for the human eye. Additionally, noticing a dearth in the number of datasets for the recognition of very similar or fine-grained actions, this paper also introduces a new dataset that is made publicly available, the Hand Wash Dataset with the intent of introducing a new benchmark for fine-grained action recognition tasks in the future.
翻译:在本文中,我们提出了一个实时行动识别的三流算法和洗手录像的新数据集,目的是将行动识别与现实世界的制约因素挂钩,以得出有效的结论。提出了一个三流融合算法,既准确又高效地实时运行,甚至对低功率系统,如Raspberry Pi。拟议算法的基石是将空间和时间信息以及物体信息纳入视频中,同时使用高效的架构,并使用光学流动计算来实时取得值得称道的结果。这一算法所取得的结果以UCF-101以及HMDB-51数据集为基准,实现了92.7%和64.9%的准确率。一个要指出的一点是,这一算法是新颖的,因为它能够了解极为相似的行动之间的复杂差异,即使对于人类来说也是困难的。此外,为了确认非常相似或精细的动作,在数据集数量上忽略了缺点,因此,这一算法的结果也以UCF-101和HMDB-51数据集为基准,分别实现了92.7%和64.9%的准确度。一个精确的精确度。要注意的是,这一算法在公众可以了解的新的数据基准中引入了一个新的数据,从而可以公开了解新的数据定义。