The problem of human trust in artificial intelligence is one of the most fundamental problems in applied machine learning. Our processes for evaluating AI trustworthiness have substantial ramifications for ML's impact on science, health, and humanity, yet confusion surrounds foundational concepts. What does it mean to trust an AI, and how do humans assess AI trustworthiness? What are the mechanisms for building trustworthy AI? And what is the role of interpretable ML in trust? Here, we draw from statistical learning theory and sociological lenses on human-automation trust to motivate an AI-as-tool framework, which distinguishes human-AI trust from human-AI-human trust. Evaluating an AI's contractual trustworthiness involves predicting future model behavior using behavior certificates (BCs) that aggregate behavioral evidence from diverse sources including empirical out-of-distribution and out-of-task evaluation and theoretical proofs linking model architecture to behavior. We clarify the role of interpretability in trust with a ladder of model access. Interpretability (level 3) is not necessary or even sufficient for trust, while the ability to run a black-box model at-will (level 2) is necessary and sufficient. While interpretability can offer benefits for trust, it can also incur costs. We clarify ways interpretability can contribute to trust, while questioning the perceived centrality of interpretability to trust in popular discourse. How can we empower people with tools to evaluate trust? Instead of trying to understand how a model works, we argue for understanding how a model behaves. Instead of opening up black boxes, we should create more behavior certificates that are more correct, relevant, and understandable. We discuss how to build trusted and trustworthy AI responsibly.
翻译:人工智能中的人类信任问题是应用机器学习中最根本的问题之一。 我们评估信信度的过程对ML对科学、健康和人性的影响有着重大影响,但混乱却围绕着基本概念。 信任AI意味着什么,人类如何评估AI的可信度? 建立可信的AI的机制是什么? 可解释的 ML 在信任中扮演什么角色? 在这里,我们从人类的视觉信任的统计学习理论和社会视角中提取,以激励一个AI-As-tool框架,这个框架将人类-AI的信任与人类-AI-人类的信任区别开来。 评估AI的合同可信度涉及使用行为证明(BC)预测未来的行为模式行为,从不同来源获得的总体行为证据,包括经验性分配和任务外的评价,以及将模型结构与行为联系起来的理论证据,我们用模型访问的阶梯来澄清它的作用是什么? 易读性(3级)是没有必要的,甚至足以建立信任的,同时我们有能力运行黑箱模型(2级)的模型(级)的可信度包括预测未来的行为行为模式(行为证明),我们又能够以负责任的方式解释, 理解我们如何理解理解, 理解, 理解, 理解,我们又能理解,可以理解, 理解,我们如何理解,可以理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解,可以理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解, 理解,