ImagerySearch：超越语义依赖约束的自适应测试时搜索视频生成方法 (ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints)

Video generation models have achieved remarkable progress, particularly excelling in realistic scenarios; however, their performance degrades notably in imaginative scenarios. These prompts often involve rarely co-occurring concepts with long-distance semantic relationships, falling outside training distributions. Existing methods typically apply test-time scaling for improving video quality, but their fixed search spaces and static reward designs limit adaptability to imaginative scenarios. To fill this gap, we propose ImagerySearch, a prompt-guided adaptive test-time search strategy that dynamically adjusts both the inference search space and reward function according to semantic relationships in the prompt. This enables more coherent and visually plausible videos in challenging imaginative settings. To evaluate progress in this direction, we introduce LDT-Bench, the first dedicated benchmark for long-distance semantic prompts, consisting of 2,839 diverse concept pairs and an automated protocol for assessing creative generation capabilities. Extensive experiments show that ImagerySearch consistently outperforms strong video generation baselines and existing test-time scaling approaches on LDT-Bench, and achieves competitive improvements on VBench, demonstrating its effectiveness across diverse prompt types. We will release LDT-Bench and code to facilitate future research on imaginative video generation.

翻译：视频生成模型已取得显著进展，尤其在现实场景中表现优异；然而，在想象性场景中其性能明显下降。这类提示通常涉及罕见共现概念及长距离语义关系，超出了训练分布范围。现有方法通常采用测试时缩放以提升视频质量，但其固定的搜索空间和静态奖励设计限制了其对想象性场景的适应性。为填补这一空白，我们提出ImagerySearch——一种提示引导的自适应测试时搜索策略，能够根据提示中的语义关系动态调整推理搜索空间和奖励函数。这使得在具有挑战性的想象性场景中能生成更连贯且视觉合理的视频。为评估该方向的进展，我们引入了LDT-Bench——首个专用于长距离语义提示的基准测试集，包含2,839个多样化概念对及用于评估创意生成能力的自动化协议。大量实验表明，ImagerySearch在LDT-Bench上持续优于强大的视频生成基线模型和现有测试时缩放方法，并在VBench上实现了具有竞争力的改进，证明了其在不同提示类型上的有效性。我们将公开LDT-Bench和代码，以促进想象性视频生成领域的未来研究。