对齐搜索：基于信念引导的探索式推理用于世界接地的具身智能体 (Align While Search: Belief-Guided Exploratory Inference for World-Grounded Embodied Agents) - 专知论文

会员服务 ·

0

对齐 · 智能体 · 信息增益 · 增益 · 具身智能体 ·

2025 年 12 月 30 日

Align While Search: Belief-Guided Exploratory Inference for World-Grounded Embodied Agents

翻译：对齐搜索：基于信念引导的探索式推理用于世界接地的具身智能体

Seohui Bae,Jeonghye Kim,Youngchul Sung,Woohyung Lim

In this paper, we propose a test-time adaptive agent that performs exploratory inference through posterior-guided belief refinement without relying on gradient-based updates or additional training for LLM agent operating under partial observability. Our agent maintains an external structured belief over the environment state, iteratively updates it via action-conditioned observations, and selects actions by maximizing predicted information gain over the belief space. We estimate information gain using a lightweight LLM-based surrogate and assess world alignment through a novel reward that quantifies the consistency between posterior belief and ground-truth environment configuration. Experiments show that our method outperforms inference-time scaling baselines such as prompt-augmented or retrieval-enhanced LLMs, in aligning with latent world states with significantly lower integration overhead.

翻译：本文提出一种测试时自适应智能体，该智能体通过后验引导的信念优化执行探索式推理，无需依赖基于梯度的更新或对部分可观测环境下运行的LLM智能体进行额外训练。我们的智能体在环境状态上维护外部结构化信念，通过动作条件观测迭代更新该信念，并通过最大化信念空间上的预测信息增益来选择动作。我们使用基于轻量级LLM的代理模型估计信息增益，并通过一种量化后验信念与真实环境配置间一致性的新型奖励机制评估世界对齐性。实验表明，在潜在世界状态对齐任务中，我们的方法优于提示增强或检索增强型LLM等推理时扩展基线方法，且集成开销显著降低。

0

相关内容

【ICML2025】通用智能体需要世界模型

【ICML2025】通用智能体需要世界模型

专知会员服务

22+阅读 · 2025年6月4日

【NeurIPS2022】持续强化学习中的解纠缠迁移

【NeurIPS2022】持续强化学习中的解纠缠迁移

专知会员服务

27+阅读 · 2022年10月3日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

CVPR2021-单目实时全身捕捉的方法

专知会员服务

20+阅读 · 2021年3月18日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

专知

18+阅读 · 2018年4月2日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器学习研究会

12+阅读 · 2017年12月24日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

行为轨迹数据高性能时空聚类及社会分析

国家自然科学基金

2+阅读 · 2014年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

OCP-LS: An Efficient Algorithm for Visual Localization

Arxiv

0+阅读 · 2025年12月31日

Counterfactual VLA: Self-Reflective Vision-Language-Action Model with Adaptive Reasoning

Arxiv

0+阅读 · 2025年12月30日

External Hippocampus: Topological Cognitive Maps for Guiding Large Language Model Reasoning

Arxiv

0+阅读 · 2025年12月25日

Cerberus: Multi-Agent Reasoning and Coverage-Guided Exploration for Static Detection of Runtime Errors

Arxiv

0+阅读 · 2025年12月24日

ReVEAL: GNN-Guided Reverse Engineering for Formal Verification of Optimized Multipliers

Arxiv

0+阅读 · 2025年12月24日

VIP会员

文章信息

相关主题

具身智能体

相关VIP内容

【ICML2025】通用智能体需要世界模型

【ICML2025】通用智能体需要世界模型

专知会员服务

22+阅读 · 2025年6月4日

【NeurIPS2022】持续强化学习中的解纠缠迁移

【NeurIPS2022】持续强化学习中的解纠缠迁移

专知会员服务

27+阅读 · 2022年10月3日

UTC: 用于视觉对话的任务间对比学习的统一Transformer

UTC: 用于视觉对话的任务间对比学习的统一Transformer

专知会员服务

14+阅读 · 2022年5月4日

CVPR2021-单目实时全身捕捉的方法

专知会员服务

20+阅读 · 2021年3月18日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

生成式人工智能导论：可靠性、负责任开发及实际应用（第二版）

《2025财年美陆军转型倡议（ATI）部队结构与组织提案》

【CMU博士论文】分布偏移下的可信机器学习

智能体 EDA 的曙光：自主数字芯片设计综述

相关资讯

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

专知

18+阅读 · 2018年4月2日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器翻译新时代：Facebook 开源无监督机器翻译模型和大规模训练语料

机器学习研究会

12+阅读 · 2017年12月24日

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

SSD: Single Shot MultiBox Detector 深度学习笔记之SSD物体检测模型

AI研习社

18+阅读 · 2017年8月31日

相关论文

OCP-LS: An Efficient Algorithm for Visual Localization

Arxiv

0+阅读 · 2025年12月31日

Counterfactual VLA: Self-Reflective Vision-Language-Action Model with Adaptive Reasoning

Arxiv

0+阅读 · 2025年12月30日

External Hippocampus: Topological Cognitive Maps for Guiding Large Language Model Reasoning

Arxiv

0+阅读 · 2025年12月25日

Cerberus: Multi-Agent Reasoning and Coverage-Guided Exploration for Static Detection of Runtime Errors

Arxiv

0+阅读 · 2025年12月24日

ReVEAL: GNN-Guided Reverse Engineering for Formal Verification of Optimized Multipliers

Arxiv

0+阅读 · 2025年12月24日

相关基金

基于深层特征学习的RGB-D人体行为识别方法

国家自然科学基金

4+阅读 · 2015年12月31日

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

行为轨迹数据高性能时空聚类及社会分析

国家自然科学基金

2+阅读 · 2014年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员