In this work, we introduce our solution to the EPIC-KITCHENS-100 2022 Action Detection challenge. One-stage Action Detection Transformer (OADT) is proposed to model the temporal connection of video segments. With the help of OADT, both the category and time boundary can be recognized simultaneously. After ensembling multiple OADT models trained from different features, our model can reach 21.28\% action mAP and ranks the 1st on the test-set of the Action detection challenge.
翻译:在这项工作中,我们提出了解决EPIC-KITCHENS-100 2022行动探测挑战的办法。建议采用一个阶段的行动探测变异器(OADT)来模拟视频段的时间连接。在OADT的帮助下,可以同时识别该类别和时间边界。在组合了多个从不同特点受训的OADT模型之后,我们的模型可以达到21.28 ⁇ 行动 mAP,并在“行动探测”挑战的测试集中排第1位。