A key goal of unsupervised representation learning is "inverting" a data generating process to recover its latent properties. Existing work that provably achieves this goal relies on strong assumptions on relationships between the latent variables (e.g., independence conditional on auxiliary information). In this paper, we take a very different perspective on the problem and ask, "Can we instead identify latent properties by leveraging knowledge of the mechanisms that govern their evolution?" We provide a complete characterization of the sources of non-identifiability as we vary knowledge about a set of possible mechanisms. In particular, we prove that if we know the exact mechanisms under which the latent properties evolve, then identification can be achieved up to any equivariances that are shared by the underlying mechanisms. We generalize this characterization to settings where we only know some hypothesis class over possible mechanisms, as well as settings where the mechanisms are stochastic. We demonstrate the power of this mechanism-based perspective by showing that we can leverage our results to generalize existing identifiable representation learning results. These results suggest that by exploiting inductive biases on mechanisms, it is possible to design a range of new identifiable representation learning approaches.
翻译:未经监督的代议制学习的关键目标是“ 反转” 数据生成过程, 以恢复其潜在属性。 可能实现此目标的现有工作取决于对潜在变量之间关系的强烈假设( 例如, 以辅助信息为条件的独立)。 在本文中,我们对问题持截然不同的观点,并问 : “ 我们能否通过利用指导其演变的机制的知识来识别潜在属性?” 我们提供了对不可识别性来源的完整描述,因为我们对一套可能的机制的了解各不相同。 特别是, 我们证明,如果我们知道潜在属性演变的确切机制,那么,就可以找到基础机制所共有的任何等同的。 我们把这种定性概括到各种环境,我们只知道关于可能的机制的一些假设类别,以及机制具有随机性的环境。 我们展示了基于机制的视角的力量,表明我们能够利用我们的成果来普及现有的可识别的代议制学习结果。 这些结果表明,通过利用机制的暗示偏见,我们有可能设计一系列新的可识别的代表制学习方法。