Diffusion models have established the state-of-the-art in text-to-image generation, but their performance often relies on a diffusion prior network to translate text embeddings into the visual manifold for easier decoding. These priors are computationally expensive and require extensive training on massive datasets. In this work, we challenge the necessity of a trained prior at all by employing Optimization-based Visual Inversion (OVI), a training-free and data-free alternative, to replace the need for a prior. OVI initializes a latent visual representation from random pseudo-tokens and iteratively optimizes it to maximize the cosine similarity with input textual prompt embedding. We further propose two novel constraints, a Mahalanobis-based and a Nearest-Neighbor loss, to regularize the OVI optimization process toward the distribution of realistic images. Our experiments, conducted on Kandinsky 2.2, show that OVI can serve as an alternative to traditional priors. More importantly, our analysis reveals a critical flaw in current evaluation benchmarks like T2I-CompBench++, where simply using the text embedding as a prior achieves surprisingly high scores, despite lower perceptual quality. Our constrained OVI methods improve visual fidelity over this baseline, with the Nearest-Neighbor approach proving particularly effective, achieving quantitative scores comparable to or higher than the state-of-the-art data-efficient prior, indicating that the idea merits further investigation. The code will be publicly available upon acceptance.


翻译:扩散模型已在文本到图像生成领域确立了最先进的性能,但其表现通常依赖于一个扩散先验网络,用于将文本嵌入映射到视觉流形以简化解码过程。这些先验计算成本高昂,且需要在大规模数据集上进行广泛训练。在本研究中,我们通过采用优化视觉反演(OVI)——一种无需训练且无需数据的替代方案——来挑战训练先验的必要性,以取代传统先验的需求。OVI从随机伪令牌初始化一个潜在视觉表示,并通过迭代优化最大化其与输入文本提示嵌入的余弦相似度。我们进一步提出了两种新颖的约束条件:基于马哈拉诺比斯距离的损失和最近邻损失,以规范OVI优化过程,使其趋向于真实图像的分布。我们在Kandinsky 2.2上进行的实验表明,OVI可作为传统先验的替代方案。更重要的是,我们的分析揭示了当前评估基准(如T2I-CompBench++)中的一个关键缺陷:尽管感知质量较低,仅使用文本嵌入作为先验即可获得惊人的高分。我们的约束OVI方法在此基线基础上提升了视觉保真度,其中最近邻方法尤为有效,其定量得分与最先进的数据高效先验相当或更高,表明这一思路值得进一步研究。代码将在论文被接受后公开提供。

0
下载
关闭预览
Top
微信扫码咨询专知VIP会员