We propose VADER, a spatio-temporal matching, alignment, and change summarization method to help fight misinformation spread via manipulated videos. VADER matches and coarsely aligns partial video fragments to candidate videos using a robust visual descriptor and scalable search over adaptively chunked video content. A transformer-based alignment module then refines the temporal localization of the query fragment within the matched video. A space-time comparator module identifies regions of manipulation between aligned content, invariant to any changes due to any residual temporal misalignments or artifacts arising from non-editorial changes of the content. Robustly matching video to a trusted source enables conclusions to be drawn on video provenance, enabling informed trust decisions on content encountered.
翻译:我们提出了 VADER,一种时空匹配、对齐和变化摘要方法,以帮助打击通过操纵视频传播的错误信息。 VADER 使用稳健的视觉描述符和自适应分块视频内容的可扩展搜索,将部分视频片段与候选视频进行粗略匹配和对齐。然后,基于 transformer 的对齐模块在匹配的视频中细化查询片段的时间定位。一个空间-时间比较模块识别对齐后的内容之间的操纵区域,不受任何由于时间错配或内容非编辑性更改引起的残留时间不匹配或工件的影响。将视频稳健地匹配到可信来源,能够使人们对遇到的内容做出有根据的信任决策。