Deep learning models struggle with compositional generalization, i.e. the ability to recognize or generate novel combinations of observed elementary concepts. In hopes of enabling compositional generalization, various unsupervised learning algorithms have been proposed with inductive biases that aim to induce compositional structure in learned representations (e.g. disentangled representation and emergent language learning). In this work, we evaluate these unsupervised learning algorithms in terms of how well they enable compositional generalization. Specifically, our evaluation protocol focuses on whether or not it is easy to train a simple model on top of the learned representation that generalizes to new combinations of compositional factors. We systematically study three unsupervised representation learning algorithms -- $\beta$-VAE, $\beta$-TCVAE, and emergent language (EL) autoencoders -- on two datasets that allow directly testing compositional generalization. We find that directly using the bottleneck representation with simple models and few labels may lead to worse generalization than using representations from layers before or after the learned representation itself. In addition, we find that the previously proposed metrics for evaluating the levels of compositionality are not correlated with actual compositional generalization in our framework. Surprisingly, we find that increasing pressure to produce a disentangled representation produces representations with worse generalization, while representations from EL models show strong compositional generalization. Taken together, our results shed new light on the compositional generalization behavior of different unsupervised learning algorithms with a new setting to rigorously test this behavior, and suggest the potential benefits of delevoping EL learning algorithms for more generalizable representations.
翻译:在这项工作中,我们评估了这些未经监督的学习算法,这些算法能够如何使组成概括化。具体地说,我们的评估协议侧重于在所学代表性之上培训一个简单模型是否容易,该模型能够概括化成组成因素的新组合。为了能够促成组成总体化,我们系统地研究三种未经监督的表述算法,即$\beeta$-VaE,$\beta$-TCVAE, 和新兴语言(EL)),目的是在有知识的表述中引入构成结构结构结构。在两个数据集中,我们用这些未经监督的学习算法直接测试组成总体化。我们发现,直接使用带有简单模型的瓶颈代表制和很少的标签,可能比从所学的层次上或从已学得的更强的构成组合中进行更简单化的描述更糟糕的概括化。此外,我们系统地研究三种未经监督的表述法,我们先前提出的总体代表制的缩略图则显示我们总体代表制的缩略图。