We propose a method for generating a temporally remapped video that matches the desired target duration while maximally preserving natural video dynamics. Our approach trains a neural network through self-supervision to recognize and accurately localize temporally varying changes in the video playback speed. To re-time videos, we 1. use the model to infer the slowness of individual video frames, and 2. optimize the temporal frame sub-sampling to be consistent with the model's slowness predictions. We demonstrate that this model can detect playback speed variations more accurately while also being orders of magnitude more efficient than prior approaches. Furthermore, we propose an optimization for video re-timing that enables precise control over the target duration and performs more robustly on longer videos than prior methods. We evaluate the model quantitatively on artificially speed-up videos, through transfer to action recognition, and qualitatively through user studies.
翻译:我们提出一种方法来制作一个与预期目标持续时间相匹配、同时最大限度地保持自然视频动态的时重绘图视频。 我们的方法通过自我监督来训练神经网络, 以识别和准确定位视频回放速度上的时间差异变化。 对于重时视频, 我们1 使用该模型来推断单个视频框架的慢速, 2 优化时间框架子抽样, 以便与模型的慢速预测保持一致 。 我们证明这个模型可以更准确地检测回放速度变化, 同时比以前的方法更高效。 此外, 我们建议优化视频再映像, 以便能够精确控制目标持续时间, 并且比以前的方法更有力地播放更长的视频。 我们通过向行动识别转移, 通过用户研究质量评估人工加速视频的模型。