Vision-Language-Action (VLA) models have recently achieved remarkable progress in robotic manipulation, yet they remain limited in failure diagnosis and learning from failures. Additionally, existing failure datasets are mostly generated programmatically in simulation, which limits their generalization to the real world. In light of these, we introduce ViFailback, a framework designed to diagnose robotic manipulation failures and provide both textual and visual correction guidance. Our framework utilizes explicit visual symbols to enhance annotation efficiency. We further release the ViFailback dataset, a large-scale collection of 58,126 Visual Question Answering (VQA) pairs along with their corresponding 5,202 real-world manipulation trajectories. Based on the dataset, we establish ViFailback-Bench, a benchmark of 11 fine-grained VQA tasks designed to assess the failure diagnosis and correction abilities of Vision-Language Models (VLMs), featuring ViFailback-Bench Lite for closed-ended and ViFailback-Bench Hard for open-ended evaluation. To demonstrate the effectiveness of our framework, we built the ViFailback-8B VLM, which not only achieves significant overall performance improvement on ViFailback-Bench but also generates visual symbols for corrective action guidance. Finally, by integrating ViFailback-8B with a VLA model, we conduct real-world robotic experiments demonstrating its ability to assist the VLA model in recovering from failures. Project Website: https://x1nyuzhou.github.io/vifailback.github.io/
翻译:视觉-语言-动作(VLA)模型近期在机器人操作领域取得了显著进展,但在失败诊断与从失败中学习方面仍存在局限。此外,现有的失败数据集大多通过程序化方式在仿真中生成,这限制了其向真实世界的泛化能力。为此,我们提出了ViFailback框架,旨在诊断机器人操作失败并提供文本与视觉双重校正指导。该框架利用显式视觉符号提升标注效率。我们进一步发布了ViFailback数据集,这是一个包含58,126个视觉问答(VQA)对及其对应5,202条真实世界操作轨迹的大规模数据集。基于该数据集,我们建立了ViFailback-Bench基准测试,包含11个细粒度VQA任务,用于评估视觉-语言模型(VLM)的失败诊断与校正能力,其中ViFailback-Bench Lite用于封闭式评估,ViFailback-Bench Hard用于开放式评估。为验证框架有效性,我们构建了ViFailback-8B VLM模型,该模型不仅在ViFailback-Bench上实现了显著的整体性能提升,还能生成用于校正指导的视觉符号。最后,通过将ViFailback-8B与VLA模型集成,我们进行了真实世界机器人实验,证明其能够协助VLA模型从失败中恢复。项目网站:https://x1nyuzhou.github.io/vifailback.github.io/