We present GELATO -- the first language-driven trajectory reshaping framework to embed geometric environment awareness and multi-agent feedback orchestration to support multi-instruction in human-robot interaction scenarios. Unlike prior learning-based methods, our approach automatically registers scene objects as 6D geometric primitives via a VLM-assisted multi-view pipeline, and an LLM translates free-form multiple instructions into explicit, verifiable geometric constraints. These are integrated into a geometric-aware vector field optimization to adapt initial trajectories while preserving smoothness, feasibility, and clearance. We further introduce a multi-agent orchestration with observer-based refinement to handle multi-instruction inputs and interactions among objectives -- increasing success rate without retraining. Simulation and real-world experiments demonstrate our method achieves smoother, safer, and more interpretable trajectory modifications compared to state-of-the-art baselines.
翻译:本文提出GELATO——首个语言驱动的轨迹重塑框架,通过嵌入几何环境感知与多智能体反馈协同机制,以支持人机交互场景中的多指令处理。与先前基于学习的方法不同,我们的方法通过视觉语言模型辅助的多视角流程自动将场景对象注册为6D几何基元,并利用大语言模型将自由形式的多重指令转化为显式、可验证的几何约束。这些约束被集成至几何感知的向量场优化中,在保持轨迹平滑性、可行性与避障间隙的同时自适应调整初始轨迹。我们进一步引入基于观察者精化的多智能体协同机制,以处理多指令输入及目标间的交互作用——无需重新训练即可提升任务成功率。仿真与真实世界实验表明,相较于现有先进基线方法,我们的方法能够实现更平滑、更安全且更具可解释性的轨迹修正。