Realistic visual simulations are omnipresent, yet their creation requires computing time, rendering, and expert animation knowledge. Open-vocabulary visual effects generation from text inputs emerges as a promising solution that can unlock immense creative potential. However, current pipelines lack both physical realism and effective language interfaces, requiring slow offline optimization. In contrast, PhysTalk takes a 3D Gaussian Splatting (3DGS) scene as input and translates arbitrary user prompts into real time, physics based, interactive 4D animations. A large language model (LLM) generates executable code that directly modifies 3DGS parameters through lightweight proxies and particle dynamics. Notably, PhysTalk is the first framework to couple 3DGS directly with a physics simulator without relying on time consuming mesh extraction. While remaining open vocabulary, this design enables interactive 3D Gaussian animation via collision aware, physics based manipulation of arbitrary, multi material objects. Finally, PhysTalk is train-free and computationally lightweight: this makes 4D animation broadly accessible and shifts these workflows from a "render and wait" paradigm toward an interactive dialogue with a modern, physics-informed pipeline.
翻译:逼真的视觉仿真无处不在,但其创建过程需要计算时间、渲染和专业的动画知识。基于文本输入的开放词汇视觉特效生成作为一种前景广阔的解决方案,能够释放巨大的创作潜力。然而,当前技术流程既缺乏物理真实性,也缺少高效的语言交互界面,且依赖耗时的离线优化。相比之下,PhysTalk以3D高斯泼溅(3DGS)场景作为输入,将任意用户指令实时转化为基于物理的交互式4D动画。大型语言模型(LLM)生成可执行代码,通过轻量化代理与粒子动力学直接修改3DGS参数。值得注意的是,PhysTalk是首个将3DGS直接与物理模拟器耦合的框架,无需依赖耗时的网格提取过程。在保持开放词汇能力的同时,该设计通过对任意多材质对象进行碰撞感知的物理操控,实现了交互式3D高斯动画。最后,PhysTalk无需训练且计算轻量:这使得4D动画技术得以广泛普及,并将工作流程从“渲染等待”范式转变为基于现代物理管线的交互式对话。