The global economy is increasingly dependent on knowledge workers to meet the needs of public and private organizations. While there is no single definition of knowledge work, organizations and industry groups still attempt to measure individuals' capability to engage in it. The most comprehensive assessment of capability readiness for professional knowledge workers is the Uniform CPA Examination developed by the American Institute of Certified Public Accountants (AICPA). In this paper, we experimentally evaluate OpenAI's `text-davinci-003` and prior versions of GPT on both a sample Regulation (REG) exam and an assessment of over 200 multiple-choice questions based on the AICPA Blueprints for legal, financial, accounting, technology, and ethical tasks. First, we find that `text-davinci-003` achieves a correct rate of 14.4% on a sample REG exam section, significantly underperforming human capabilities on quantitative reasoning in zero-shot prompts. Second, `text-davinci-003` appears to be approaching human-level performance on the Remembering & Understanding and Application skill levels in the Exam absent calculation. For best prompt and parameters, the model answers 57.6% of questions correctly, significantly better than the 25% guessing rate, and its top two answers are correct 82.1% of the time, indicating strong non-entailment. Finally, we find that recent generations of GPT-3 demonstrate material improvements on this assessment, rising from 30% for `text-davinci-001` to 57% for `text-davinci-003`. These findings strongly suggest that large language models have the potential to transform the quality and efficiency of future knowledge work.
翻译:全球经济日益依赖知识工作者来满足公共和私营组织的需求。虽然对知识工作没有单一的定义,但各组织和行业团体仍然试图衡量个人参与知识工作的能力。对专业知识工作者能力准备情况的最全面评估是美国注册会计师协会(AICPA)开发的统一会计师考试。在本文中,我们实验性地评价OpenAI的“Text-davinci-003”和GPT的先前版本,对一项抽样条例(REG)考试以及对200多个基于AICPA法律、金融、会计、技术和道德任务蓝图的多种选择问题进行评估。首先,我们发现“text-davinci-003”在抽样REG考试部分实现了14.4%的正确率,在零点推论中大大低于定量推理的人力能力。第二,“text-da-davinci-003”似乎正在接近人类层面的绩效,在Examnational-ledgetal and Applemental legal disal disalth disalent 。为了最迅速和最准确的参数和最准确的答案是,在最新25.6%的答案中,我们最接近于25节正正确地发现,最新的30的答案。