The task of unsupervised image-to-image translation has seen substantial advancements in recent years through the use of deep neural networks. Typically, the proposed solutions learn the characterizing distribution of two large, unpaired collections of images, and are able to alter the appearance of a given image, while keeping its geometry intact. In this paper, we explore the capabilities of neural networks to understand image structure given only a single pair of images, A and B. We seek to generate images that are structurally aligned: that is, to generate an image that keeps the appearance and style of B, but has a structural arrangement that corresponds to A. The key idea is to map between image patches at different scales. This enables controlling the granularity at which analogies are produced, which determines the conceptual distinction between style and content. In addition to structural alignment, our method can be used to generate high quality imagery in other conditional generation tasks utilizing images A and B only: guided image synthesis, style and texture transfer, text translation as well as video translation. Our code and additional results are available in https://github.com/rmokady/structural-analogy/.
翻译:近些年来,通过使用深层神经网络,未经监督的图像到图像翻译的任务取得了显著进展。 通常, 拟议的解决方案通过使用深层神经网络, 学习了两个大型、 未受保护的图像收藏的分布特征, 并且能够改变给定图像的外观, 同时又保持其几何原原形完整 。 在本文中, 我们探索神经网络理解图像结构的能力, 只给一对图像, A和 B 。 我们试图生成结构一致的图像: 也就是说, 生成一个保持 B 外观和风格的图像, 并有一个与 A 相对应的结构安排。 关键的想法是在不同尺度的图像补丁间绘制地图。 这样可以控制生成模拟的颗粒性, 从而决定风格和内容之间的概念区别 。 除了结构调整外, 我们的方法可以用于在其他有条件生成的图像 A 和 B 任务中生成高质量的图像: 指导图像合成、 风格和文本传输、 文本翻译以及视频翻译。 我们的代码和额外结果可以在 https://github.com/ molkady/strualmalogyal/ alogy.