训练主动性与个性化并重的大语言模型智能体 (Training Proactive and Personalized LLM Agents)

While existing work focuses primarily on task success, we argue that effective real-world agents require optimizing three dimensions: productivity (task completion), proactivity (asking essential questions), and personalization (adapting to diverse user preferences). We introduce UserVille, an interactive environment with LLM-based user simulators enabling diverse, configurable user preferences. Leveraging UserVille, we introduce PPP, a multi-objective reinforcement learning approach that jointly optimizes all three dimensions: Productivity, Proactivity, and Personalization. Experiments on software engineering and deep research tasks show that agents trained with PPP achieve substantial improvements over strong baselines such as GPT-5 (+21.6 on average), demonstrating the ability to ask strategic clarifying questions, adapt to unseen user preferences, and improve task success through better interaction. This work demonstrates that explicitly optimizing for user-centered interaction is critical for building practical and effective AI agents.

翻译：现有研究主要关注任务成功率，我们认为有效的现实世界智能体需要优化三个维度：生产力（任务完成度）、主动性（提出关键问题）和个性化（适应多样化用户偏好）。我们引入UserVille——一个基于大语言模型的用户模拟器交互环境，支持多样化、可配置的用户偏好。利用UserVille，我们提出PPP（生产力-主动性-个性化）多目标强化学习方法，联合优化所有三个维度。在软件工程与深度研究任务上的实验表明，采用PPP训练的智能体相较于GPT-5等强基线模型平均提升21.6%，展现出提出策略性澄清问题、适应未知用户偏好以及通过优化交互提升任务成功率的能力。本研究表明，显式优化以用户为中心的交互对于构建实用高效的人工智能智能体至关重要。