使用抽样来估计和改进有担保的自动计分系统的效率 (Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees)

Automated Scoring (AS), the natural language processing task of scoring essays and speeches in an educational testing setting, is growing in popularity and being deployed across contexts from government examinations to companies providing language proficiency services. However, existing systems either forgo human raters entirely, thus harming the reliability of the test, or score every response by both human and machine thereby increasing costs. We target the spectrum of possible solutions in between, making use of both humans and machines to provide a higher quality test while keeping costs reasonable to democratize access to AS. In this work, we propose a combination of the existing paradigms, sampling responses to be scored by humans intelligently. We propose reward sampling and observe significant gains in accuracy (19.80% increase on average) and quadratic weighted kappa (QWK) (25.60% on average) with a relatively small human budget (30% samples) using our proposed sampling. The accuracy increase observed using standard random and importance sampling baselines are 8.6% and 12.2% respectively. Furthermore, we demonstrate the system's model agnostic nature by measuring its performance on a variety of models currently deployed in an AS setting as well as pseudo models. Finally, we propose an algorithm to estimate the accuracy/QWK with statistical guarantees (Our code is available at https://git.io/J1IOy).

翻译：自动Scoring(AS)是教育测试环境中评分论文和演讲的自然语言处理任务,在教育测试环境中,这种自然语言处理任务越来越受欢迎,并且从政府考试到提供语言熟练服务的公司,在各种背景中部署,但是,现有的系统要么完全放弃人速率,从而损害测试的可靠性,或者通过人体和机器的每一次反应得分,从而增加成本。我们的目标是在两种可能的解决办法之间,利用人和机器提供更高的质量测试,同时保持使获得AS的民主化的合理成本。在这项工作中,我们建议结合现有的模式,抽样反应将人类明智地评分。我们提议奖励抽样,并观察到在准确性方面(平均增加19.80%)和四边加权 kappa(QWK)(平均25.60%)取得显著进展,使用我们提议的抽样相对较少的人力预算(30%的样本)。我们用标准的随机和重要取样基线观测到的准确度提高幅度分别为8.6%和12.2%。此外,我们通过测量目前在AS设置中部署的各种模型的性能衡量其性。我们最后建议以统计模型/QALO的精确性来测算。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

计算机理论顶会STOC 2021奖项出炉，滕尚华等华人学者获奖

专知会员服务

8+阅读 · 2021年7月22日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日