We present a method that decomposes, or "unwraps", an input video into a set of layered 2D atlases, each providing a unified representation of the appearance of an object (or background) over the video. For each pixel in the video, our method estimates its corresponding 2D coordinate in each of the atlases, giving us a consistent parameterization of the video, along with an associated alpha (opacity) value. Importantly, we design our atlases to be interpretable and semantic, which facilitates easy and intuitive editing in the atlas domain, with minimal manual work required. Edits applied to a single 2D atlas (or input video frame) are automatically and consistently mapped back to the original video frames, while preserving occlusions, deformation, and other complex scene effects such as shadows and reflections. Our method employs a coordinate-based Multilayer Perceptron (MLP) representation for mappings, atlases, and alphas, which are jointly optimized on a per-video basis, using a combination of video reconstruction and regularization losses. By operating purely in 2D, our method does not require any prior 3D knowledge about scene geometry or camera poses, and can handle complex dynamic real world videos. We demonstrate various video editing applications, including texture mapping, video style transfer, image-to-video texture transfer, and segmentation/labeling propagation, all automatically produced by editing a single 2D atlas image.
翻译:我们向一组层2D地图集展示一种解析方法,或“未写字”,一种输入视频,一种输入视频到一组层2D地图集中,每个图集对一个对象(或背景)在视频中的外观提供统一表示。对于每个像素,我们的方法在每一个地图集中估计其相应的2D协调,给我们带来一个一致的视频参数化,以及相关的阿尔法(不透明)值。重要的是,我们设计我们的地图集是可解释的和语义化的,便于在地图集域中进行容易和直观的编辑,并只需要做最低限度的手工工作。适用于单一的2D目录(或输入视频框)的编辑会自动和一致地被映射回原始的视频框架,同时保存隐形、变形和其他复杂的场景效应,例如影子和反射。我们的方法使用一个基于协调的多层 Perceptron(MLP)代表,用于绘图、地图集、图集和阿尔法,在每部图像集的基础上共同优化和直观编辑的编辑,同时使用视频重建与正规的图像格式损失组合。我们的方法并不需要任何在2D之前的图像图象变动的图像中进行任何方法。