Over++：面向图层交互效果的生成式视频合成 (Over++: Generative Video Compositing for Layer Interaction Effects)

In professional video compositing workflows, artists must manually create environmental interactions-such as shadows, reflections, dust, and splashes-between foreground subjects and background layers. Existing video generative models struggle to preserve the input video while adding such effects, and current video inpainting methods either require costly per-frame masks or yield implausible results. We introduce augmented compositing, a new task that synthesizes realistic, semi-transparent environmental effects conditioned on text prompts and input video layers, while preserving the original scene. To address this task, we present Over++, a video effect generation framework that makes no assumptions about camera pose, scene stationarity, or depth supervision. We construct a paired effect dataset tailored for this task and introduce an unpaired augmentation strategy that preserves text-driven editability. Our method also supports optional mask control and keyframe guidance without requiring dense annotations. Despite training on limited data, Over++ produces diverse and realistic environmental effects and outperforms existing baselines in both effect generation and scene preservation.

翻译：在专业视频合成工作流中，艺术家需要手动创建前景主体与背景图层之间的环境交互效果——例如阴影、反射、灰尘与飞溅。现有视频生成模型难以在添加此类效果的同时保持输入视频内容；当前视频修复方法则要么需要逐帧的昂贵掩码标注，要么会产生不合理的合成结果。我们提出增强合成这一新任务，其目标是在保持原始场景的前提下，根据文本提示与输入视频图层合成逼真的半透明环境效果。针对该任务，我们提出了Over++视频特效生成框架，该框架无需对相机位姿、场景静止性或深度监督做任何假设。我们为此任务构建了配对的特效数据集，并提出一种保持文本驱动编辑能力的非配对增强策略。我们的方法还支持可选的掩码控制与关键帧引导，且无需密集标注。尽管在有限数据上训练，Over++仍能生成多样且逼真的环境效果，并在特效生成与场景保持两方面均优于现有基线方法。