In this technical report, we briefly introduce the solutions of our team `PKU-WICT-MIPL' for the PIC Makeup Temporal Video Grounding (MTVG) Challenge in ACM-MM 2022. Given an untrimmed makeup video and a step query, the MTVG aims to localize a temporal moment of the target makeup step in the video. To tackle this task, we propose a phrase relationship mining framework to exploit the temporal localization relationship relevant to the fine-grained phrase and the whole sentence. Besides, we propose to constrain the localization results of different step sentence queries to not overlap with each other through a dynamic programming algorithm. The experimental results demonstrate the effectiveness of our method. Our final submission ranked 2nd on the leaderboard, with only a 0.55\% gap from the first.
翻译:在本技术报告中,我们简要地介绍了我们的“PKU-WICT-MIPL”团队在ACM-MM 2022年的PIC模拟时间视频定位(MTVG)挑战(MTVG)中的解决办法。考虑到一个未剪裁的化妆视频和一个步骤查询,MTVG旨在将视频中目标构成步骤的时间时刻定位到本地。为完成这项任务,我们提议了一个“采矿关系框架”短语,以利用与精细的词句和整个句子相关的时间本地化关系。此外,我们提议限制不同步骤句问询的本地化结果,以避免通过动态的编程算法相互重叠。实验结果显示了我们的方法的有效性。我们的最后呈件排在首列第2位,与第一个词相比只有0.55英寸的差距。