View synthesis is usually done by an autoencoder, in which the encoder maps a source view image into a latent content code, and the decoder transforms it into a target view image according to the condition. However, the source contents are often not well kept in this setting, which leads to unnecessary changes during the view translation. Although adding skipped connections, like Unet, alleviates the problem, but it often causes the failure on the view conformity. This paper proposes a new architecture by performing the source-to-target deformation in an iterative way. Instead of simply incorporating the features from multiple layers of the encoder, we design soft and hard deformation modules, which warp the encoder features to the target view at different resolutions, and give results to the decoder to complement the details. Particularly, the current warping flow is not only used to align the feature of the same resolution, but also as an approximation to coarsely deform the high resolution feature. Then the residual flow is estimated and applied in the high resolution, so that the deformation is built up in the coarse-to-fine fashion. To better constrain the model, we synthesize a rough target view image based on the intermediate flows and their warped features. The extensive ablation studies and the final results on two different data sets show the effectiveness of the proposed model.
翻译:查看合成通常由自动编码器进行, 编码器将源视图图像映射成潜在内容代码, 解码器则根据条件将它转换成目标视图图像。 但是, 源内容通常没有被妥善保存, 从而导致在视图翻译过程中出现不必要的变化。 虽然添加了跳过连接, 如 Unet, 缓解了问题, 但往往导致视图符合性失灵。 本文通过以迭接方式执行源到目标的变形, 提出了一个新架构。 我们设计了软和硬的变形模块, 而不是简单地将编码器多层的功能纳入到编码器中, 将编码器的特性转换到不同分辨率的目标视图中, 并给解码器提供结果以补充细节。 特别是, 目前的扭曲流不仅用于调整同一分辨率的特性, 而且还用作粗略解析高分辨率特性的近似值。 然后, 残余流被估计并应用在高分辨率中应用。 因此, 变形的模块和硬化模块建在粗化到战争模式中建立, 。 为了更好地限制两个中间的模型, 我们综合了它们的拟议模型。