Graphical User Interface (GUI) agents require effective use of historical context to perform sequential navigation tasks. While incorporating past actions and observations can improve decision making, naive use of full history leads to excessive computational overhead and distraction from irrelevant information. To address this, we introduce HiconAgent, a GUI agent trained with History Context-aware Policy Optimization (HCPO) for efficient and effective utilization of historical information. HCPO optimizes history usage in both sampling and policy updates through two complementary components: (1) Dynamic Context Sampling (DCS) presents the agent with variable length histories during sampling, enabling adaptive use of the most relevant context; (2) Anchor-guided History Compression (AHC) refines the policy update phase with a dual branch strategy where the compressed branch removes history observations while keeping history actions as information flow anchors. The compressed and uncompressed branches are coupled through a history-enhanced alignment loss to enforce consistent history usage while maintaining efficiency. Experiments on mainstream GUI navigation benchmarks demonstrate strong performance. Despite being smaller, HiconAgent-3B outperforms GUI-R1-7B by +8.46 percent grounding accuracy and +11.32 percent step success rate on GUI-Odyssey, while achieving comparable results on AndroidControl and AITW with up to 2.47x computational speedup and 60 percent FLOPs reduction.
翻译:图形用户界面(GUI)代理需要有效利用历史上下文以执行序列导航任务。虽然整合过去的动作和观察能提升决策质量,但简单使用完整历史记录会导致过高的计算开销,并受到无关信息的干扰。为此,我们提出HiconAgent,一种通过历史上下文感知策略优化(HCPO)训练的GUI代理,旨在高效且有效地利用历史信息。HCPO通过两个互补组件优化历史信息在采样和策略更新中的使用:(1)动态上下文采样(DCS)在采样过程中为代理提供可变长度的历史记录,使其能够自适应地利用最相关的上下文;(2)锚点引导的历史压缩(AHC)采用双分支策略改进策略更新阶段,其中压缩分支移除历史观察,同时保留历史动作作为信息流锚点。压缩分支与未压缩分支通过历史增强对齐损失耦合,以在保持效率的同时强制实现一致的历史信息使用。在主流GUI导航基准测试上的实验表明其性能优异。尽管规模较小,HiconAgent-3B在GUI-Odyssey上的接地准确率比GUI-R1-7B高出+8.46%,步骤成功率高出+11.32%,同时在AndroidControl和AITW上取得可比结果,计算速度提升最高达2.47倍,浮点运算量减少60%。