复杂智能体环境中结构增强轨迹学习与编码的表征框架 (A representational framework for learning and encoding structurally enriched trajectories in complex agent environments)

The ability of artificial intelligence agents to make optimal decisions and generalise them to different domains and tasks is compromised in complex scenarios. One way to address this issue has focused on learning efficient representations of the world and on how the actions of agents affect them in state-action transitions. Whereas such representations are procedurally efficient, they lack structural richness. To address this problem, we propose to enhance the agent's ontology and extend the traditional conceptualisation of trajectories to provide a more nuanced view of task execution. Structurally Enriched Trajectories (SETs) extend the encoding of sequences of states and their transitions by incorporating hierarchical relations between objects, interactions, and affordances. SETs are built as multi-level graphs, providing a detailed representation of the agent dynamics and a transferable functional abstraction of the task. SETs are integrated into an architecture, Structurally Enriched Trajectory Learning and Encoding (SETLE), that employs a heterogeneous graph-based memory structure of multi-level relational dependencies essential for generalisation. We demonstrate that SETLE can support downstream tasks, enabling agents to recognise task relevant structural patterns across CREATE and MiniGrid environments. Finally, we integrate SETLE with reinforcement learning and show measurable improvements in downstream performance, including breakthrough success rates in complex, sparse-reward tasks.

翻译：在复杂场景中，人工智能智能体做出最优决策并将其泛化至不同领域和任务的能力受到限制。解决此问题的一种方法聚焦于学习世界的有效表征，以及智能体行为在状态-动作转移中如何影响这些表征。尽管此类表征具有过程效率，但缺乏结构丰富性。为解决该问题，我们提出增强智能体的本体论，并扩展传统的轨迹概念化方式，以提供对任务执行更细致的视角。结构增强轨迹通过纳入对象间的层次关系、交互作用及可供性，扩展了对状态序列及其转移的编码。SETs被构建为多层图结构，提供了智能体动态的详细表征和任务的可迁移功能抽象。SETs被整合至一个架构——结构增强轨迹学习与编码中，该架构采用基于异质图的多层关系依赖记忆结构，这对泛化至关重要。我们证明SETLE能够支持下游任务，使智能体在CREATE和MiniGrid环境中识别任务相关的结构模式。最后，我们将SETLE与强化学习相结合，展示了下游性能的可量化提升，包括在复杂稀疏奖励任务中突破性的成功率。