Many real-world datasets can be divided into groups according to certain salient features (e.g. grouping images by subject, grouping text by font, etc.). Often, machine learning tasks require that these features be represented separately from those manifesting independently of the grouping. For example, image translation entails changing the style of an image while preserving its content. We formalize these two kinds of attributes as two complementary generative factors called "domain" and "content", and address the problem of disentangling them in a fully unsupervised way. To achieve this, we propose a principled, generalizable probabilistic model inspired by the Variational Autoencoder. Our model exhibits state-of-the-art performance on the composite task of generating images by combining the domain of one input with the content of another. Distinctively, it can perform this task in a few-shot, unsupervised manner, without being provided with explicit labelling for either domain or content. The disentangled representations are learned through the combination of a group-wise encoder and a novel domain-confusion loss.
翻译:许多真实世界数据集可以按照某些显著特征(例如,按主题对图像进行分组,将文字按字体进行分组等)分为不同的组别。 通常,机器学习任务要求这些特征与独立于组合的特征分开代表。 例如,图像翻译需要改变图像的风格,同时保留其内容。 我们将这些两类属性正式确定为称为“主”和“内容”的两种互为补充的基因化因素,并以一种完全不受监督的方式解决将其脱钩的问题。 为了实现这一目标,我们提议了一个原则性、可普遍实现的概率模型,由动态自动coder 所启发。我们的模型展示了通过将一个输入域与另一个输入内容合并来生成图像的综合任务的最新性表现。 清晰地说,它可以以微小的、不统一的方式执行这项任务,而无需为域或内容提供明确的标签。 脱钩式的表达是通过组合组合组合组合组合组合组合组合成一个组合式的编码器和新的域融合损失来学习的。