This report describes our submission called "TarHeels" for the Ego4D: Object State Change Classification Challenge. We use a transformer-based video recognition model and leverage the Divided Space-Time Attention mechanism for classifying object state change in egocentric videos. Our submission achieves the second-best performance in the challenge. Furthermore, we perform an ablation study to show that identifying object state change in egocentric videos requires temporal modeling ability. Lastly, we present several positive and negative examples to visualize our model's predictions. The code is publicly available at: https://github.com/md-mohaiminul/ObjectStateChange
翻译:本报告描述了我们提交的题为“Ego4D:对象国变化分类挑战”的“TarHeels”的呈件。我们使用基于变压器的视频识别模型,并利用分化空间时注意机制对自我中心视频中的天体状态变化进行分类。我们提交的呈件在挑战中取得了第二好成绩。此外,我们进行了一项通缩研究,以表明识别以自我为中心的视频的天体状态变化需要时间模型模型能力。最后,我们举出了几个正面和负面的例子来直观我们的模型预测。该代码在https://github.com/md-mohaiminul/Object StateChange上公开发布。