探索人类对AI响应的感知：基于生成模型风险缓解混合方法研究的洞见 (Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models)

Heloisa Candello,Muneeza Azmat,Uma Sushmitha Gunturi,Raya Horesh,Rogerio Abreu de Paula,Heloisa Pimentel,Marcelo Carpinette Grave,Aminat Adebiyi,Tiago Machado,Maysa Malfiza Garcia de Macedo

from arxiv, 16 pages, 2 figures, 6 tables. Under review for publication

With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their `aptitude' for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants' native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is often treated in the quantitative evaluation of LLMs. We also introduced new metrics for training and evaluating mitigation strategies and insights for human-AI evaluation studies.

翻译：随着生成式AI的快速普及，研究人类对生成响应的感知变得至关重要。一个主要挑战在于其'倾向性'产生幻觉及有害内容。尽管已投入大量努力实施防护机制，但人类对这些缓解策略的感知仍鲜为人知。我们开展了一项混合方法实验，从多个维度评估缓解策略的响应表现：忠实性、公平性、有害内容消除能力及相关性。在受试者内研究设计中，57名参与者在两种条件下评估响应：有害响应及其缓解版本，以及纯缓解后的响应。结果显示，参与者的母语、AI工作经验和标注熟悉度显著影响评估结果。参与者对语言及上下文属性表现出高度敏感性，对轻微语法错误予以扣分，同时对保留语义语境给予奖励。这与大型语言模型定量评估中常处理语言的方式形成对比。我们还提出了用于训练和评估缓解策略的新指标，并为人类-AI评估研究提供了洞见。

相关内容

关注 7072

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日