As artificial intelligence (AI) rapidly advances, especially in multimodal large language models (MLLMs), research focus is shifting from single-modality text processing to the more complex domains of multimodal and embodied AI. Embodied intelligence focuses on training agents within realistic simulated environments, leveraging physical interaction and action feedback rather than conventionally labeled datasets. Yet, most existing simulation platforms remain narrowly designed, each tailored to specific tasks. A versatile, general-purpose training environment that can support everything from low-level embodied navigation to high-level composite activities, such as multi-agent social simulation and human-AI collaboration, remains largely unavailable. To bridge this gap, we introduce TongSIM, a high-fidelity, general-purpose platform for training and evaluating embodied agents. TongSIM offers practical advantages by providing over 100 diverse, multi-room indoor scenarios as well as an open-ended, interaction-rich outdoor town simulation, ensuring broad applicability across research needs. Its comprehensive evaluation framework and benchmarks enable precise assessment of agent capabilities, such as perception, cognition, decision-making, human-robot cooperation, and spatial and social reasoning. With features like customized scenes, task-adaptive fidelity, diverse agent types, and dynamic environmental simulation, TongSIM delivers flexibility and scalability for researchers, serving as a unified platform that accelerates training, evaluation, and advancement toward general embodied intelligence.
翻译:随着人工智能(AI)的快速发展,尤其是在多模态大语言模型(MLLMs)领域,研究焦点正从单模态文本处理转向更为复杂的多模态与具身AI领域。具身智能侧重于在逼真的仿真环境中训练智能体,其依赖物理交互与动作反馈,而非传统的标注数据集。然而,现有的大多数仿真平台设计仍较为局限,各自针对特定任务。一个能够支持从低层具身导航到高层复合活动(如多智能体社会仿真和人机协作)的通用、多用途训练环境,目前仍基本缺失。为填补这一空白,我们推出了TongSIM——一个用于训练和评估具身智能体的高保真、通用平台。TongSIM提供超过100个多样化、多房间的室内场景,以及一个开放、交互丰富的户外城镇仿真,确保了其广泛的研究适用性,具有显著的实际优势。其全面的评估框架与基准测试能够精确评估智能体的各项能力,如感知、认知、决策、人机协作以及空间与社会推理。凭借可定制场景、任务自适应保真度、多样智能体类型以及动态环境仿真等特性,TongSIM为研究人员提供了灵活性与可扩展性,作为一个统一平台,加速了面向通用具身智能的训练、评估与进展。