This report presents a brief description of our winning solution to the AVA Active Speaker Detection (ASD) task at ActivityNet Challenge 2022. Our underlying model UniCon+ continues to build on our previous work, the Unified Context Network (UniCon) and Extended UniCon which are designed for robust scene-level ASD. We augment the architecture with a simple GRU-based module that allows information of recurring identities to flow across scenes through read and update operations. We report a best result of 94.47% mAP on the AVA-ActiveSpeaker test set, which continues to rank first on this year's challenge leaderboard and significantly pushes the state-of-the-art.
翻译:本报告简述了我们在2022年AVA活动网挑战中积极语音探测(ASD)任务中获胜的解决方案。我们的基本模型Union+继续以我们先前的工作为基础,即统一环境网络(Unicon)和扩展统一会议(Unicon Con),这些是设计用于稳健的场景水平的ASD。我们用一个简单的基于GRU的模块来扩大这一架构,该模块允许通过阅读和更新操作在场外传递重复身份信息。我们在AVA-AviensSpeaSpeaker测试集上报告了94.47 % mAP的最佳结果。AVA-AviewSpeaSpeaker测试集继续排第一,并大力推进最新技术。