AI agents are continually optimized for tasks related to human work, such as software engineering and professional writing, signaling a pressing trend with significant impacts on the human workforce. However, these agent developments have often not been grounded in a clear understanding of how humans execute work, to reveal what expertise agents possess and the roles they can play in diverse workflows. In this work, we study how agents do human work by presenting the first direct comparison of human and agent workers across multiple essential work-related skills: data analysis, engineering, computation, writing, and design. To better understand and compare heterogeneous computer-use activities of workers, we introduce a scalable toolkit to induce interpretable, structured workflows from either human or agent computer-use activities. Using such induced workflows, we compare how humans and agents perform the same tasks and find that: (1) While agents exhibit promise in their alignment to human workflows, they take an overwhelmingly programmatic approach across all work domains, even for open-ended, visually dependent tasks like design, creating a contrast with the UI-centric methods typically used by humans. (2) Agents produce work of inferior quality, yet often mask their deficiencies via data fabrication and misuse of advanced tools. (3) Nonetheless, agents deliver results 88.3% faster and cost 90.4-96.2% less than humans, highlighting the potential for enabling efficient collaboration by delegating easily programmable tasks to agents.
翻译:AI智能体在软件工程和专业写作等人类工作相关任务上持续优化,标志着对劳动力市场产生重大影响的紧迫趋势。然而,这些智能体的开发往往缺乏对人类工作执行方式的清晰认知,未能揭示智能体具备何种专业知识以及在不同工作流程中可扮演的角色。本研究通过首次对数据分折、工程、计算、写作和设计这五项核心工作技能进行人类与智能体工作者的直接比较,探究智能体如何执行人类工作。为更好地理解和比较工作者异构的计算机使用活动,我们开发了可扩展工具包,能够从人类或智能体的计算机使用活动中推导出可解释的结构化工作流程。基于推导出的工作流程,我们比较了人类与智能体执行相同任务的方式,发现:(1)虽然智能体在工作流程对齐方面展现出潜力,但其在所有工作领域均采用压倒性的程序化方法,即便是设计这类开放式、视觉依赖型任务,这与人类通常采用的以用户界面为中心的方法形成鲜明对比。(2)智能体产出工作质量较低,但常通过数据伪造和滥用高级工具掩盖缺陷。(3)尽管如此,智能体交付结果的速度比人类快88.3%,成本降低90.4-96.2%,这表明通过将易编程任务委托给智能体可实现高效协作。