Deep learning models struggle with compositional generalization, i.e. the ability to recognize or generate novel combinations of observed elementary concepts. In hopes of enabling compositional generalization, various unsupervised learning algorithms have been proposed with inductive biases that aim to induce compositional structure in learned representations (e.g. disentangled representation and emergent language learning). In this work, we evaluate these unsupervised learning algorithms in terms of how well they enable compositional generalization. Specifically, our evaluation protocol focuses on whether or not it is easy to train a simple model on top of the learned representation that generalizes to new combinations of compositional factors. We systematically study three unsupervised representation learning algorithms - $\beta$-VAE, $\beta$-TCVAE, and emergent language (EL) autoencoders - on two datasets that allow directly testing compositional generalization. We find that directly using the bottleneck representation with simple models and few labels may lead to worse generalization than using representations from layers before or after the learned representation itself. In addition, we find that the previously proposed metrics for evaluating the levels of compositionality are not correlated with actual compositional generalization in our framework. Surprisingly, we find that increasing pressure to produce a disentangled representation produces representations with worse generalization, while representations from EL models show strong compositional generalization. Taken together, our results shed new light on the compositional generalization behavior of different unsupervised learning algorithms with a new setting to rigorously test this behavior, and suggest the potential benefits of delevoping EL learning algorithms for more generalizable representations.
翻译:在这项工作中,我们评估了这些未经监督的学习算法,这些算法能够如何使组成概括化。具体地说,我们的评估协议侧重于在所学代表性之上培训一个简单模型是否容易,这种模型可以概括化成组成因素的新组合。我们系统地研究了三种未经监督的教学算法,这种算法具有感官偏见,目的是在有知识的表述中引入构成结构结构(例如分解代表制和新兴语言学习)。在这项工作中,我们评估了这些未经监督的学习算法,这些算法能够直接测试组成总体化。我们发现,直接使用带有简单模型的瓶颈代表制和少数标签的算法,可能会比在所学的层次上或者在学会的形成组成新组合后更简单化。此外,我们系统地研究三种未经监督的表达法―― $\beta$-VaE, $\beta$$$\beta$-TCVAE, 和新兴语言(EL) 自动解说者在两个数据集中可以直接测试组成情况。我们发现,在一般代表性中直接使用由简单模型和少数标签制的表示可能会导致比使用较轻化的描述更简单化的缩化的描述更难的缩化的缩化。我们更难的算法,我们在总体代表制结构中发现,我们更难的演化的演化的演算法则在形成一种非结构中发现,我们总结构的演化的演化的演化总结构中发现,在形成更难的演算制的演算法。