To highlight the challenges of achieving representation disentanglement for text domain in an unsupervised setting, in this paper we select a representative set of successfully applied models from the image domain. We evaluate these models on 6 disentanglement metrics, as well as on downstream classification tasks and homotopy. To facilitate the evaluation, we propose two synthetic datasets with known generative factors. Our experiments highlight the existing gap in the text domain and illustrate that certain elements such as representation sparsity (as an inductive bias), or representation coupling with the decoder could impact disentanglement. To the best of our knowledge, our work is the first attempt on the intersection of unsupervised representation disentanglement and text, and provides the experimental framework and datasets for examining future developments in this direction.
翻译:为了突出在无人监督的环境中实现文本领域代表脱钩的挑战,我们在本文件中从图像领域选择了一套成功应用模型的代表性。我们评估了这些模型的6个分解指标,以及下游分类任务和同质体。为了便利评估,我们建议了两个合成数据集,其中含有已知的变异因素。我们的实验突出了文本领域的现有差距,并表明某些要素,如代表偏移(作为一种感性偏差)或与解密器的混合,可能会影响分解。据我们所知,我们的工作是首次尝试将未经监督的表述脱钩和文本交叉起来,并为研究这方面的未来发展提供了实验框架和数据集。