To see is to sketch -- free-hand sketching naturally builds ties between human and machine vision. In this paper, we present a novel approach for translating an object photo to a sketch, mimicking the human sketching process. This is an extremely challenging task because the photo and sketch domains differ significantly. Furthermore, human sketches exhibit various levels of sophistication and abstraction even when depicting the same object instance in a reference photo. This means that even if photo-sketch pairs are available, they only provide weak supervision signal to learn a translation model. Compared with existing supervised approaches that solve the problem of D(E(photo)) -> sketch, where E($\cdot$) and D($\cdot$) denote encoder and decoder respectively, we take advantage of the inverse problem (e.g., D(E(sketch)) -> photo), and combine with the unsupervised learning tasks of within-domain reconstruction, all within a multi-task learning framework. Compared with existing unsupervised approaches based on cycle consistency (i.e., D(E(D(E(photo)))) -> photo), we introduce a shortcut consistency enforced at the encoder bottleneck (e.g., D(E(photo)) -> photo) to exploit the additional self-supervision. Both qualitative and quantitative results show that the proposed model is superior to a number of state-of-the-art alternatives. We also show that the synthetic sketches can be used to train a better fine-grained sketch-based image retrieval (FG-SBIR) model, effectively alleviating the problem of sketch data scarcity.
翻译:要查看的是草图 -- 免费手动草图自然地在人与机器的视觉之间建立起联系。 在本文中, 我们展示了一种新颖的方法, 将对象照片转换成草图, 模仿人类的草图过程。 这是一项极具挑战性的任务, 因为图片和草图区域差别很大。 此外, 人类草图显示出了不同程度的精密和抽象, 即便在参考图片中描绘了同一个对象实例。 这意味着即使有照片- 伸展配对配对, 它们也只能提供薄弱的监督信号, 学习翻译模型。 与现有的解决 D( E (photo) - > 草图的监管方法相比, E($/cdoto$) 和 D(cdoot$) 分别表示编码和脱色域。 人类草图( D(E(S)) 显示一个不超强的内建模型的学习任务, 和多塔式学习框架。 与基于周期一致性( i. (e. (E.\) D. (E) lifrode) 和 D(E. (E) co) liver- develop) liver- develop) 显示一个更精确的图像的图像(我们使用的G- hold- hold- ) 显示一个更精确的D- holal- hol- hol- hol- hol- 数据。