With the rapid advancement of artificial intelligence and robotics, the integration of Large Language Models (LLMs) with 3D vision is emerging as a transformative approach to enhancing robotic sensing technologies. This convergence enables machines to perceive, reason and interact with complex environments through natural language and spatial understanding, bridging the gap between linguistic intelligence and spatial perception. This review provides a comprehensive analysis of state-of-the-art methodologies, applications and challenges at the intersection of LLMs and 3D vision, with a focus on next-generation robotic sensing technologies. We first introduce the foundational principles of LLMs and 3D data representations, followed by an in-depth examination of 3D sensing technologies critical for robotics. The review then explores key advancements in scene understanding, text-to-3D generation, object grounding and embodied agents, highlighting cutting-edge techniques such as zero-shot 3D segmentation, dynamic scene synthesis and language-guided manipulation. Furthermore, we discuss multimodal LLMs that integrate 3D data with touch, auditory and thermal inputs, enhancing environmental comprehension and robotic decision-making. To support future research, we catalog benchmark datasets and evaluation metrics tailored for 3D-language and vision tasks. Finally, we identify key challenges and future research directions, including adaptive model architectures, enhanced cross-modal alignment and real-time processing capabilities, which pave the way for more intelligent, context-aware and autonomous robotic sensing systems.
翻译:随着人工智能和机器人技术的快速发展,大语言模型与三维视觉的融合正成为提升机器人感知技术的变革性方法。这种融合使机器能够通过自然语言和空间理解来感知、推理并与复杂环境交互,从而弥合语言智能与空间感知之间的鸿沟。本文综述了大语言模型与三维视觉交叉领域的最新方法、应用及挑战,重点关注下一代机器人感知技术。首先介绍大语言模型和三维数据表示的基础原理,随后深入探讨对机器人至关重要的三维感知技术。接着,综述分析了场景理解、文本到三维生成、物体定位及具身智能体等关键进展,重点介绍了零样本三维分割、动态场景合成和语言引导操作等前沿技术。此外,本文讨论了融合三维数据与触觉、听觉及热输入的多模态大语言模型,这些模型增强了环境理解与机器人决策能力。为支持未来研究,我们整理了专为三维-语言及视觉任务设计的基准数据集和评估指标。最后,我们指出了关键挑战及未来研究方向,包括自适应模型架构、增强的跨模态对齐和实时处理能力,这些方向为开发更智能、情境感知和自主的机器人感知系统铺平了道路。