We present a deep neural network (DNN) that uses both sensor data (gyroscope) and image content (optical flow) to stabilize videos through unsupervised learning. The network fuses optical flow with real/virtual camera pose histories into a joint motion representation. Next, the LSTM block infers the new virtual camera pose, and this virtual pose is used to generate a warping grid that stabilizes the frame. Novel relative motion representation as well as a multi-stage training process are presented to optimize our model without any supervision. To the best of our knowledge, this is the first DNN solution that adopts both sensor data and image for stabilization. We validate the proposed framework through ablation studies and demonstrated the proposed method outperforms the state-of-art alternative solutions via quantitative evaluations and a user study.
翻译:我们展示了一个深层神经网络(DNN),它既使用传感器数据(陀螺仪),又使用图像内容(光流),通过不受监督的学习稳定视频。这个网络将光学流与真实/虚拟相机连接起来,将历史呈现成一个联合动作。接下来,LSTM区块推断出新的虚拟相机,而这种虚拟面貌被用来生成一个稳定框架的扭曲网格。提供了新颖相对动作表以及多阶段培训进程,以在没有任何监督的情况下优化我们的模型。据我们所知,这是第一个既采用传感器数据又采用图像以稳定模式的DNNN解决方案。我们通过反动研究验证了拟议框架,并展示了拟议方法通过定量评估和用户研究优于最新替代解决方案的方法。