Current video/action understanding systems have demonstrated impressive performance on large recognition tasks. However, they might be limiting themselves to learning to recognize spatiotemporal patterns, rather than attempting to thoroughly understand the actions. To spur progress in the direction of a truer, deeper understanding of videos, we introduce the task of win-fail action recognition -- differentiating between successful and failed attempts at various activities. We introduce a first of its kind paired win-fail action understanding dataset with samples from the following domains: "General Stunts," "Internet Wins-Fails," "Trick Shots," and "Party Games." Unlike existing action recognition datasets, intra-class variation is high making the task challenging, yet feasible. We systematically analyze the characteristics of the win-fail task/dataset with prototypical action recognition networks and a novel video retrieval task. While current action recognition methods work well on our task/dataset, they still leave a large gap to achieve high performance. We hope to motivate more work towards the true understanding of actions/videos. Dataset will be available from https://github.com/ParitoshParmar/Win-Fail-Action-Recognition.
翻译:当前的视频/行动理解系统在大型识别任务上表现出了令人印象深刻的成绩。 但是,它们可能仅限于学习识别时空模式,而不是试图彻底理解行动。 为了推动在真实、更深入地理解视频的方向上取得进展, 我们引入了双败行动识别任务 -- -- 将各种活动的成功和失败尝试区分开来。 我们引入了同类的首个双赢行动理解数据集, 包括来自以下领域的样本: " 将军 " 、 " Internet Wins-fails " 、 " Trick shots " 和 " Party Changes " 。 与现有的行动识别数据集不同, 阶级内部差异使得任务具有挑战性, 但却是可行的。 我们系统地分析了双败任务/数据设置的特点, 以原型行动识别网络和新的视频检索任务。 虽然当前行动识别方法在任务/数据设置上效果良好, 但仍留下很大的空白以达到高性。 我们希望激励更多工作, 以真正理解行动/视频。 数据设置将来自 https://github.com/Partosh/Regmar-Faction.