Questions of fairness, robustness, and transparency are paramount to address before deploying NLP systems. Central to these concerns is the question of reliability: Can NLP systems reliably treat different demographics fairly and function correctly in diverse and noisy environments? To address this, we argue for the need for reliability testing and contextualize it among existing work on improving accountability. We show how adversarial attacks can be reframed for this goal, via a framework for developing reliability tests. We argue that reliability testing -- with an emphasis on interdisciplinary collaboration -- will enable rigorous and targeted testing, and aid in the enactment and enforcement of industry standards.
翻译:公平、稳健和透明的问题在部署国家劳工计划系统之前是解决的首要问题。这些问题的核心是可靠性问题:国家劳工计划系统能否可靠地公平对待不同的人口,并在多样和吵闹的环境中正确运作?为了解决这个问题,我们主张需要进行可靠性测试,并在现有的改进问责制的工作中考虑到可靠性问题。我们表明如何通过制定可靠性测试的框架,重新界定对抗性攻击来实现这一目标。我们主张,可靠性测试 -- -- 重点是跨学科合作 -- -- 将有助于严格和有的放矢的测试,并有助于工业标准的制定和执行。