Aligned latent spaces, where meaningful semantic shifts in the input space correspond to a translation in the embedding space, play an important role in the success of downstream tasks such as unsupervised clustering and data imputation. In this work, we prove that linear and nonlinear autoencoders produce aligned latent spaces by stretching along the left singular vectors of the data. We fully characterize the amount of stretching in linear autoencoders and provide an initialization scheme to arbitrarily stretch along the top directions using these networks. We also quantify the amount of stretching in nonlinear autoencoders in a simplified setting. We use our theoretical results to align drug signatures across cell types in gene expression space and semantic shifts in word embedding spaces.
翻译:维系的隐性空间, 输入空间中有意义的语义变化与嵌入空间的翻译相对应, 在下游任务的成功方面起着重要作用, 例如未受监督的集群和数据估算。 在这项工作中, 我们证明线性和非线性自动电解码器通过延展数据左单向矢量来生成匹配的隐性空间。 我们充分描述线性自动转换器的伸展量, 并提供一个初始化计划, 用这些网络任意沿着顶部方向延伸。 我们还量化了非线性自动转换器在简化环境中的伸展量 。 我们使用理论结果来将基因表达空间和文字嵌入空间的语义变化中的细胞特征对齐 。