生成式搜索引擎的可验证性评估 (Evaluating Verifiability in Generative Search Engines) - 专知论文

会员服务 ·

0

搜索引擎 · 可验证性 · 引擎 · 搜索 · 关联 ·

2023 年 4 月 19 日

Evaluating Verifiability in Generative Search Engines

翻译：生成式搜索引擎的可验证性评估

Nelson F. Liu,Tianyi Zhang,Percy Liang

from arxiv, 25 pages, 12 figures

Generative search engines directly generate responses to user queries, along with in-line citations. A prerequisite trait of a trustworthy generative search engine is verifiability, i.e., systems should cite comprehensively (high citation recall; all statements are fully supported by citations) and accurately (high citation precision; every cite supports its associated statement). We conduct human evaluation to audit four popular generative search engines -- Bing Chat, NeevaAI, perplexity.ai, and YouChat -- across a diverse set of queries from a variety of sources (e.g., historical Google user queries, dynamically-collected open-ended questions on Reddit, etc.). We find that responses from existing generative search engines are fluent and appear informative, but frequently contain unsupported statements and inaccurate citations: on average, a mere 51.5% of generated sentences are fully supported by citations and only 74.5% of citations support their associated sentence. We believe that these results are concerningly low for systems that may serve as a primary tool for information-seeking users, especially given their facade of trustworthiness. We hope that our results further motivate the development of trustworthy generative search engines and help researchers and users better understand the shortcomings of existing commercial systems.

翻译：生成式搜索引擎直接针对用户查询生成响应并附有内联引用。值得信赖的生成式搜索引擎必备的特征是可验证性，即系统应具有全面引用(高引用召回率；所有语句都得到充分支持)和准确引用(高引用精度；每个引用都支持其关联语句)的能力。我们对四个流行的生成式搜索引擎——Bing Chat、NeevaAI、perplexity.ai和YouChat在各种来源(例如，从Google历史用户查询收集的，Reddit上动态收集的开放性问题等)的不同查询集上进行人类评估。我们发现，现有的生成式搜索引擎响应流畅、看似信息丰富，但经常包含不支持的语句和不准确的引用：平均来说，仅有51.5%的生成语句被充分引用支持、只有74.5%的引用支持其关联语句。我们认为，考虑到它们表现出的可信度，这些结果对于可能作为信息获取用户的主要工具的系统而言是令人担忧的。我们希望我们的结果进一步推动值得信任的生成式搜索引擎的发展，并帮助研究人员和用户更好地理解现有商业系统的缺点。

0

相关内容

搜索引擎

搜索引擎指根据一定的策略、运用特定的计算机程序搜集互联网上的信息，在对信息进行组织和处理后，为用户提供检索服务的系统。

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

EMNLP 2021 | 基于证据检索和图神经验证网络的表格事实验证模型

EMNLP 2021 | 基于证据检索和图神经验证网络的表格事实验证模型

专知会员服务

20+阅读 · 2021年12月12日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

机器之心

0+阅读 · 2022年9月27日

MongoDB 发布“可查询加密”系统 Queryable Encryption

MongoDB 发布“可查询加密”系统 Queryable Encryption

CSDN

0+阅读 · 2022年6月9日

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

PaperWeekly

0+阅读 · 2022年6月1日

WWW2022 | 基于因果的推荐算法教程

WWW2022 | 基于因果的推荐算法教程

机器学习与推荐算法

3+阅读 · 2022年5月26日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

基于关键词的大规模链接数据搜索技术研究

国家自然科学基金

7+阅读 · 2015年12月31日

异质社会网络信息可信度评估与建模研究

国家自然科学基金

0+阅读 · 2013年12月31日

Ad hoc网络中基于博弈论的激励合作路由算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

GIT1CC2结构域在保护脊髓缺血再灌注损伤（SCII）中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向位置偏好查询的移动P2P数据库构建及算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

未来互联网测量与性能评价方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

HIV-1 Tat蛋白损伤视网膜色素上皮细胞的microRNA组学研究

国家自然科学基金

0+阅读 · 2009年12月31日

Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts

Arxiv

0+阅读 · 2023年6月5日

Knowledge-Driven Robot Program Synthesis from Human VR Demonstrations

Arxiv

0+阅读 · 2023年6月5日

Self-Edit: Fault-Aware Code Editor for Code Generation

Self-Edit: Fault-Aware Code Editor for Code Generation

Arxiv

0+阅读 · 2023年6月5日

Evaluating Language Models for Mathematics through Interactions

Arxiv

0+阅读 · 2023年6月2日

Auditing for Human Expertise

Arxiv

0+阅读 · 2023年6月2日

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration

Arxiv

0+阅读 · 2023年6月2日

How Ready are Pre-trained Abstractive Models and LLMs for Legal Case Judgement Summarization?

Arxiv

0+阅读 · 2023年6月2日

ProKnow: Process Knowledge for Safety Constrained and Explainable Question Generation for Mental Health Diagnostic Assistance

Arxiv

0+阅读 · 2023年6月1日

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

Arxiv

0+阅读 · 2023年6月1日

Scaling Evidence-based Instructional Design Expertise through Large Language Models

Arxiv

0+阅读 · 2023年5月31日

VIP会员

文章信息

相关主题

相关VIP内容

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

【Meta AI】多模态理解研究进展，Advances in multimodal understanding research at Meta AI

专知会员服务

68+阅读 · 2022年3月20日

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

【ACL2022-华盛顿大学】生成知识促进常识推理，Generated Knowledge Prompting for Commonsense Reasoning

专知会员服务

26+阅读 · 2022年3月1日

EMNLP 2021 | 基于证据检索和图神经验证网络的表格事实验证模型

EMNLP 2021 | 基于证据检索和图神经验证网络的表格事实验证模型

专知会员服务

20+阅读 · 2021年12月12日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

芝加哥大学计算机系助理教授Grant Ho招募计算机安全方向博士 / 硕士 / 实习生（2023 春 / 秋）

机器之心

0+阅读 · 2022年9月27日

MongoDB 发布“可查询加密”系统 Queryable Encryption

MongoDB 发布“可查询加密”系统 Queryable Encryption

CSDN

0+阅读 · 2022年6月9日

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

NAACL 2022 | FACTPEGASUS：抽象摘要的真实性感知预训练和微调

PaperWeekly

0+阅读 · 2022年6月1日

WWW2022 | 基于因果的推荐算法教程

WWW2022 | 基于因果的推荐算法教程

机器学习与推荐算法

3+阅读 · 2022年5月26日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

相关论文

Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts

Arxiv

0+阅读 · 2023年6月5日

Knowledge-Driven Robot Program Synthesis from Human VR Demonstrations

Arxiv

0+阅读 · 2023年6月5日

Self-Edit: Fault-Aware Code Editor for Code Generation

Self-Edit: Fault-Aware Code Editor for Code Generation

Arxiv

0+阅读 · 2023年6月5日

Evaluating Language Models for Mathematics through Interactions

Arxiv

0+阅读 · 2023年6月2日

Auditing for Human Expertise

Arxiv

0+阅读 · 2023年6月2日

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration

Arxiv

0+阅读 · 2023年6月2日

How Ready are Pre-trained Abstractive Models and LLMs for Legal Case Judgement Summarization?

Arxiv

0+阅读 · 2023年6月2日

ProKnow: Process Knowledge for Safety Constrained and Explainable Question Generation for Mental Health Diagnostic Assistance

Arxiv

0+阅读 · 2023年6月1日

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

Arxiv

0+阅读 · 2023年6月1日

Scaling Evidence-based Instructional Design Expertise through Large Language Models

Arxiv

0+阅读 · 2023年5月31日

相关基金

基于关键词的大规模链接数据搜索技术研究

国家自然科学基金

7+阅读 · 2015年12月31日

异质社会网络信息可信度评估与建模研究

国家自然科学基金

0+阅读 · 2013年12月31日

Ad hoc网络中基于博弈论的激励合作路由算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

GIT1CC2结构域在保护脊髓缺血再灌注损伤（SCII）中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向位置偏好查询的移动P2P数据库构建及算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

未来互联网测量与性能评价方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

扩展的模糊逻辑与基于蕴涵算子的Rough逻辑

国家自然科学基金

0+阅读 · 2011年12月31日

HIV-1 Tat蛋白损伤视网膜色素上皮细胞的microRNA组学研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员