Achieving character animation that meets studio-grade production standards remains challenging despite recent progress. Existing approaches can transfer motion from a driving video to a reference image, but often fail to preserve structural fidelity and temporal consistency in wild scenarios involving complex motion and cross-identity animations. In this work, we present \textbf{SCAIL} (\textbf{S}tudio-grade \textbf{C}haracter \textbf{A}nimation via \textbf{I}n-context \textbf{L}earning), a framework designed to address these challenges from two key innovations. First, we propose a novel 3D pose representation, providing a more robust and flexible motion signal. Second, we introduce a full-context pose injection mechanism within a diffusion-transformer architecture, enabling effective spatio-temporal reasoning over full motion sequences. To align with studio-level requirements, we develop a curated data pipeline ensuring both diversity and quality, and establish a comprehensive benchmark for systematic evaluation. Experiments show that \textbf{SCAIL} achieves state-of-the-art performance and advances character animation toward studio-grade reliability and realism.
翻译:尽管近期取得进展,实现符合影视级制作标准的角色动画仍具挑战。现有方法可将驱动视频中的运动迁移至参考图像,但在涉及复杂运动和跨身份动画的开放场景中,常难以保持结构保真度与时间一致性。本研究提出\\textbf{SCAIL}(\\textbf{S}tudio-grade \\textbf{C}haracter \\textbf{A}nimation via \\textbf{I}n-context \\textbf{L}earning)框架,通过两项关键创新应对这些挑战:首先,我们提出一种新颖的三维姿态表示,提供更鲁棒灵活的运动信号;其次,我们在扩散-Transformer架构中引入全上下文姿态注入机制,实现对完整运动序列的有效时空推理。为满足影视级需求,我们构建了兼顾多样性与质量的精选数据流水线,并建立了系统性评估的综合基准。实验表明,\\textbf{SCAIL}实现了最先进的性能,将角色动画向影视级可靠性与真实感推进。