支持人工智能与人类协作审计大型语言模型 (Supporting Human-AI Collaboration in Auditing LLMs with LLMs) - 专知论文

会员服务 ·

0

语言模型 · 大型语言模型 · 协作 · 人工智能 · 工具 ·

2023 年 4 月 19 日

Supporting Human-AI Collaboration in Auditing LLMs with LLMs

翻译：支持人工智能与人类协作审计大型语言模型

Charvi Rastogi,Marco Tulio Ribeiro,Nicholas King,Saleema Amershi

from arxiv, 21 pages, 3 figures

Large language models are becoming increasingly pervasive and ubiquitous in society via deployment in sociotechnical systems. Yet these language models, be it for classification or generation, have been shown to be biased and behave irresponsibly, causing harm to people at scale. It is crucial to audit these language models rigorously. Existing auditing tools leverage either or both humans and AI to find failures. In this work, we draw upon literature in human-AI collaboration and sensemaking, and conduct interviews with research experts in safe and fair AI, to build upon the auditing tool: AdaTest (Ribeiro and Lundberg, 2022), which is powered by a generative large language model (LLM). Through the design process we highlight the importance of sensemaking and human-AI communication to leverage complementary strengths of humans and generative models in collaborative auditing. To evaluate the effectiveness of the augmented tool, AdaTest++, we conduct user studies with participants auditing two commercial language models: OpenAI's GPT-3 and Azure's sentiment analysis model. Qualitative analysis shows that AdaTest++ effectively leverages human strengths such as schematization, hypothesis formation and testing. Further, with our tool, participants identified a variety of failures modes, covering 26 different topics over 2 tasks, that have been shown before in formal audits and also those previously under-reported.

翻译：大型语言模型通过部署在社会技术系统中越来越普遍和广泛。然而，这些语言模型，无论是用于分类还是生成，都已经被证明存在偏见和不负责任的行为，从而对人们造成了大规模的伤害。因此，严格审计这些语言模型至关重要。现有的审计工具利用人类和/或人工智能来发现故障。在这项工作中，我们借鉴了人工智能协作和感知的文献，并与安全和公平人工智能研究专家进行了访谈，以建立审计工具AdaTest（Ribeiro and Lundberg，2022），其由生成式大型语言模型（LLM）驱动。通过设计过程，我们强调了理解和人工智能之间的沟通的重要性，以利用人类和生成模型在协作审计中的互补优势。为了评估增强的工具AdaTest ++的有效性，我们与参与审核两个商业语言模型：OpenAI的GPT-3和Azure的情感分析模型的参与者进行用户研究。定性分析表明，AdaTest ++有效地利用了人类的优势，如图表化、假设形成和测试。此外，通过我们的工具，参与者发现了各种失败模式，涵盖了2项任务中的26个不同主题，这些主题以前在正式审计中已经被证明存在，也包括以前未被报道的主题。

0

相关内容

语言模型

大模型如何用好？亚马逊最新《大型语言模型(LLMs)实践：ChatGPT》综述，全面概述LLM模型、数据、任务的实战指南

大模型如何用好？亚马逊最新《大型语言模型(LLMs)实践：ChatGPT》综述，全面概述LLM模型、数据、任务的实战指南

专知会员服务

139+阅读 · 2023年4月27日

从ChatGPT看AI未来趋势和挑战 | 万字长文

从ChatGPT看AI未来趋势和挑战 | 万字长文

专知会员服务

173+阅读 · 2023年4月18日

【AAMAS2021】机器推理可解释，152页ppt，Machine Reasoning Explainability

专知会员服务

36+阅读 · 2021年5月9日

不可错过！斯坦福<人工智能疾病诊断与信息推荐>2021课程，附Slides下载

不可错过！斯坦福<人工智能疾病诊断与信息推荐>2021课程，附Slides下载

专知会员服务

47+阅读 · 2021年4月29日

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

专知会员服务

101+阅读 · 2020年10月13日

最新《可解释人工智能XAI：机会与挑战》25页pdf，Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey

最新《可解释人工智能XAI：机会与挑战》25页pdf，Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey

专知会员服务

181+阅读 · 2020年6月23日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

【微软】利用知识图谱提高抽象摘要的事实正确性，Boosting Factual Correctness

专知会员服务

18+阅读 · 2020年3月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

专知

16+阅读 · 2020年10月13日

AI可解释性文献列表

AI可解释性文献列表

专知

42+阅读 · 2019年10月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

以离子液体为探针催化剂研究碳水化合物制备5-羟甲基糠醛

国家自然科学基金

0+阅读 · 2014年12月31日

扩展的线性时段不变式的模型检验

国家自然科学基金

1+阅读 · 2014年12月31日

DOT1介导的H3K79甲基化修饰的调节机制

国家自然科学基金

0+阅读 · 2014年12月31日

水在二氧化钛表面的光催化解离研究

国家自然科学基金

0+阅读 · 2013年12月31日

中链脂肪酸对HDL、HDL-C和apoA-1代谢的调节效应及其机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

高效NH3-SCR催化剂的设计与合成

国家自然科学基金

0+阅读 · 2011年12月31日

Fuzzy Domain 理论及其新拓扑工具研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于动态模型可信度的集成学习算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Safe AI for health and beyond -- Monitoring to transform a health service

Arxiv

0+阅读 · 2023年6月6日

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

Arxiv

1+阅读 · 2023年6月5日

Discover and Cure: Concept-aware Mitigation of Spurious Correlation

Arxiv

0+阅读 · 2023年6月5日

AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap

Arxiv

0+阅读 · 2023年6月2日

Building a Credible Case for Safety: Waymo's Approach for the Determination of Absence of Unreasonable Risk

Arxiv

0+阅读 · 2023年6月2日

Auditing for Human Expertise

Arxiv

0+阅读 · 2023年6月2日

Analyzing Credit Risk Model Problems through NLP-Based Clustering and Machine Learning: Insights from Validation Reports

Arxiv

0+阅读 · 2023年6月2日

Enough With "Human-AI Collaboration"

Arxiv

0+阅读 · 2023年6月2日

Navigating Fairness in Radiology AI: Concepts, Consequences,and Crucial Considerations

Arxiv

0+阅读 · 2023年6月2日

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Arxiv

77+阅读 · 2019年10月22日

VIP会员

文章信息

相关主题

大型语言模型

相关VIP内容

大模型如何用好？亚马逊最新《大型语言模型(LLMs)实践：ChatGPT》综述，全面概述LLM模型、数据、任务的实战指南

大模型如何用好？亚马逊最新《大型语言模型(LLMs)实践：ChatGPT》综述，全面概述LLM模型、数据、任务的实战指南

专知会员服务

139+阅读 · 2023年4月27日

从ChatGPT看AI未来趋势和挑战 | 万字长文

从ChatGPT看AI未来趋势和挑战 | 万字长文

专知会员服务

173+阅读 · 2023年4月18日

【AAMAS2021】机器推理可解释，152页ppt，Machine Reasoning Explainability

专知会员服务

36+阅读 · 2021年5月9日

不可错过！斯坦福<人工智能疾病诊断与信息推荐>2021课程，附Slides下载

不可错过！斯坦福<人工智能疾病诊断与信息推荐>2021课程，附Slides下载

专知会员服务

47+阅读 · 2021年4月29日

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

专知会员服务

101+阅读 · 2020年10月13日

最新《可解释人工智能XAI：机会与挑战》25页pdf，Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey

最新《可解释人工智能XAI：机会与挑战》25页pdf，Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey

专知会员服务

181+阅读 · 2020年6月23日

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

【微软】利用知识图谱提高抽象摘要的事实正确性，Boosting Factual Correctness

专知会员服务

18+阅读 · 2020年3月23日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

专知

16+阅读 · 2020年10月13日

AI可解释性文献列表

AI可解释性文献列表

专知

42+阅读 · 2019年10月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

【论文推荐】最新七篇知识图谱相关论文—嵌入式知识、Zero-shot识别、知识图谱嵌入、网络库、变分推理、解释、弱监督

专知

19+阅读 · 2018年3月26日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Safe AI for health and beyond -- Monitoring to transform a health service

Arxiv

0+阅读 · 2023年6月6日

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

Arxiv

1+阅读 · 2023年6月5日

Discover and Cure: Concept-aware Mitigation of Spurious Correlation

Arxiv

0+阅读 · 2023年6月5日

AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap

Arxiv

0+阅读 · 2023年6月2日

Building a Credible Case for Safety: Waymo's Approach for the Determination of Absence of Unreasonable Risk

Arxiv

0+阅读 · 2023年6月2日

Auditing for Human Expertise

Arxiv

0+阅读 · 2023年6月2日

Analyzing Credit Risk Model Problems through NLP-Based Clustering and Machine Learning: Insights from Validation Reports

Arxiv

0+阅读 · 2023年6月2日

Enough With "Human-AI Collaboration"

Arxiv

0+阅读 · 2023年6月2日

Navigating Fairness in Radiology AI: Concepts, Consequences,and Crucial Considerations

Arxiv

0+阅读 · 2023年6月2日

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Arxiv

77+阅读 · 2019年10月22日

相关基金

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

以离子液体为探针催化剂研究碳水化合物制备5-羟甲基糠醛

国家自然科学基金

0+阅读 · 2014年12月31日

扩展的线性时段不变式的模型检验

国家自然科学基金

1+阅读 · 2014年12月31日

DOT1介导的H3K79甲基化修饰的调节机制

国家自然科学基金

0+阅读 · 2014年12月31日

水在二氧化钛表面的光催化解离研究

国家自然科学基金

0+阅读 · 2013年12月31日

中链脂肪酸对HDL、HDL-C和apoA-1代谢的调节效应及其机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

高效NH3-SCR催化剂的设计与合成

国家自然科学基金

0+阅读 · 2011年12月31日

Fuzzy Domain 理论及其新拓扑工具研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于动态模型可信度的集成学习算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员