Representation learning aims to discover individual salient features of a domain in a compact and descriptive form that strongly identifies the unique characteristics of a given sample respective to its domain. Existing works in visual style representation literature have tried to disentangle style from content during training explicitly. A complete separation between these has yet to be fully achieved. Our paper aims to learn a representation of visual artistic style more strongly disentangled from the semantic content depicted in an image. We use Neural Style Transfer (NST) to measure and drive the learning signal and achieve state-of-the-art representation learning on explicitly disentangled metrics. We show that strongly addressing the disentanglement of style and content leads to large gains in style-specific metrics, encoding far less semantic information and achieving state-of-the-art accuracy in downstream multimodal applications.
翻译:摘要:表征学习旨在以紧凑且描述性的形式发现领域的个体显著特征,以强烈识别给定样本相对于其领域的独特特征。视觉风格表征文献中现有的工作尝试在训练期间明确地将样式与内容分离。然而,完全分离这两者尚未完全实现。我们的论文旨在更强烈地从图像中揭示视觉艺术风格的表征,这种表征与所描绘的语义内容更强的分离。我们使用神经风格转移(NST)来评估和驱动学习信号,并在明确分离度量方面实现了最新的表征学习。我们表明,强烈地解决样式和内容的分离导致样式特定度量方面的大幅收益,编码较少的语义信息且在下游多模式应用中实现了最先进的准确性。