Despite the success of diffusion models (DMs), we still lack a thorough understanding of their latent space. While image editing with GANs builds upon latent space, DMs rely on editing the conditions such as text prompts. We present an unsupervised method to discover interpretable editing directions for the latent variables $\mathbf{x}_t \in \mathcal{X}$ of DMs. Our method adopts Riemannian geometry between $\mathcal{X}$ and the intermediate feature maps $\mathcal{H}$ of the U-Nets to provide a deep understanding over the geometrical structure of $\mathcal{X}$. The discovered semantic latent directions mostly yield disentangled attribute changes, and they are globally consistent across different samples. Furthermore, editing in earlier timesteps edits coarse attributes, while ones in later timesteps focus on high-frequency details. We define the curvedness of a line segment between samples to show that $\mathcal{X}$ is a curved manifold. Experiments on different baselines and datasets demonstrate the effectiveness of our method even on Stable Diffusion. Our source code will be publicly available for the future researchers.
翻译:尽管扩散模型(DMs)取得了成功,但我们仍缺乏对其潜在空间的透彻理解。 虽然 GANs 的图像编辑建立在潜藏空间上, 但DMs依赖编辑文本提示等条件。 我们展示了一种不受监督的方法, 以发现潜在变量$\ mathbf{x<unk> t\\ in\ mathcal{X}$的可解释的编辑方向。 我们的方法在$\ mathcal{X} 美元和中间特征地图 $\mathcal{H} 美元之间采用里曼式的几何测量法。 我们定义了UNets的一线段的曲线, 以提供对 $\ mathcal{H} 的深度理解。 所发现的语义潜在方向大多产生不相交错的属性变化, 并且在不同样本中具有全球一致性。 此外, 早期的编辑过程编辑了粗糙的属性, 而后期段的则侧重于高频细节。 我们定义了两个样本之间的线段的曲线的精度, 以显示$\macal{X} $\ x} 。 在不同的基线和数据源上的实验将展示我们可公开的代码的有效性。</s>