外部稳定审计,以测试AI雇用中个性预测的有效性 (External Stability Auditing to Test the Validity of Personality Prediction in AI Hiring)

Automated hiring systems are among the fastest-developing of all high-stakes AI systems. Among these are algorithmic personality tests that use insights from psychometric testing, and promise to surface personality traits indicative of future success based on job seekers' resumes or social media profiles. We interrogate the validity of such systems using stability of the outputs they produce, noting that reliability is a necessary, but not a sufficient, condition for validity. Our approach is to (a) develop a methodology for an external audit of stability of predictions made by algorithmic personality tests, and (b) instantiate this methodology in an audit of two systems, Humantic AI and Crystal. Crucially, rather than challenging or affirming the assumptions made in psychometric testing -- that personality is a meaningful and measurable construct, and that personality traits are indicative of future success on the job -- we frame our methodology around testing the underlying assumptions made by the vendors of the algorithmic personality tests themselves. In our audit of Humantic AI and Crystal, we find that both systems show substantial instability with respect to key facets of measurement, and so cannot be considered valid testing instruments. For example, Crystal frequently computes different personality scores if the same resume is given in PDF vs. in raw text format, violating the assumption that the output of an algorithmic personality test is stable across job-irrelevant variations in the input. Among other notable findings is evidence of persistent -- and often incorrect -- data linkage by Humantic AI.

翻译：自动化雇用系统是所有高要求人工智能系统中发展最快的系统之一,其中包括利用心理测试的洞察力进行算法性格测试,并承诺根据求职者简历或社交媒体简况来显示人格特征。我们使用其产出的稳定性来询问这种系统的有效性,指出可靠性是必要的,但不是充分的有效性条件。我们的做法是:(a) 制定一种方法,对算法性格测试作出的预测的稳定性进行外部审计,以及(b) 在对人类和水晶这两个系统进行审计时即时采用这种方法。关键是,而不是挑战或肯定在心理测试中作出的假设 -- -- 人格是一个有意义和可衡量的结构,而个性特征特征特征特征特征是今后工作成功的标志 -- -- 我们把我们的方法设置在测试算法性人格测试供应商自己所作的基本假设的基础上。在对人性与水晶进行审计时,我们发现两种系统在计量的关键方面都表现出严重的不稳定性,因此不能被视为有效的测试工具。例如,在人类特征测试中经常使用不准确性能的模型,也就是在正常的性能变的模型中,在测试中反复地分析不同的性数据。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日