转向人类和机器的科学理解基准 (Towards a Benchmark for Scientific Understanding in Humans and Machines) - 专知论文

会员服务 ·

0

基准 · 质量控制 · 性能评估 · 基准测试 · 人工智能系统 ·

2023 年 4 月 21 日

Towards a Benchmark for Scientific Understanding in Humans and Machines

翻译：转向人类和机器的科学理解基准

Kristian Gonzalez Barman,Sascha Caron,Tom Claassen,Henk de Regt

Scientific understanding is a fundamental goal of science, allowing us to explain the world. There is currently no good way to measure the scientific understanding of agents, whether these be humans or Artificial Intelligence systems. Without a clear benchmark, it is challenging to evaluate and compare different levels of and approaches to scientific understanding. In this Roadmap, we propose a framework to create a benchmark for scientific understanding, utilizing tools from philosophy of science. We adopt a behavioral notion according to which genuine understanding should be recognized as an ability to perform certain tasks. We extend this notion by considering a set of questions that can gauge different levels of scientific understanding, covering information retrieval, the capability to arrange information to produce an explanation, and the ability to infer how things would be different under different circumstances. The Scientific Understanding Benchmark (SUB), which is formed by a set of these tests, allows for the evaluation and comparison of different approaches. Benchmarking plays a crucial role in establishing trust, ensuring quality control, and providing a basis for performance evaluation. By aligning machine and human scientific understanding we can improve their utility, ultimately advancing scientific understanding and helping to discover new insights within machines.

翻译：科学理解是科学的根本目标，可以帮助我们解释世界。当前没有有效的方式来衡量智能体（无论是人类还是人工智能系统）的科学理解能力。没有明确的基准，评估和比较不同水平和方法的科学理解能力非常具有挑战性。在这篇路线图中，我们提出了一个框架，利用科学哲学工具创建科学理解基准。我们采用行为概念，认为真正的理解应该被认为是执行某些任务的能力。我们通过考虑一组问题来扩展这个概念，这些问题可以评估不同层次的科学理解，包括信息检索、组织信息以产生解释的能力、在不同情况下推断事物会有不同的能力等。科学理解基准（SUB）由这些测试组成，可以评估和比较不同的方法。基准测试在建立信任、确保质量控制和提供性能评估方面起着至关重要的作用。通过对机器和人类的科学理解进行对齐，我们可以提高它们的实用性，最终推动科学理解的进步，帮助在机器中发现新的见解。

2

相关内容

多模态认知计算

多模态认知计算

专知会员服务

180+阅读 · 2022年9月16日

【nature machine intelligence】终身学习机器的生物基础，Biological underpinnings for lifelong learning machines

【nature machine intelligence】终身学习机器的生物基础，Biological underpinnings for lifelong learning machines

专知会员服务

38+阅读 · 2022年3月24日

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

专知会员服务

31+阅读 · 2022年3月11日

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

专知会员服务

19+阅读 · 2022年3月1日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

专知会员服务

22+阅读 · 2019年11月15日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

机器之心

1+阅读 · 2022年10月30日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

学术头条

0+阅读 · 2022年6月16日

重磅开讲：图灵奖得主—— Joseph Sifakis

重磅开讲：图灵奖得主—— Joseph Sifakis

THU数据派

0+阅读 · 2022年6月13日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于新型聚合物/无机杂化空穴传输材料的高效钙钛矿太阳能电池研究

国家自然科学基金

0+阅读 · 2015年12月31日

巨电流变液分散相界面结构及工作过程的原位SHINERS研究

国家自然科学基金

0+阅读 · 2014年12月31日

量子Ising模型中Kibble-Zurek机制的量子模拟实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属氧化物界面的自旋极化电子输运研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cu/Al复合带固-液铸轧电流强化复合成形技术基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

下白垩统热河群鸟类化石形态和分类学研究

国家自然科学基金

0+阅读 · 2012年12月31日

半导体多晶薄膜中缺陷的形成与控制动力学

国家自然科学基金

0+阅读 · 2012年12月31日

氢协同掺杂大尺寸功能金刚石单晶的高压合成与性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于心理学归因理论的社会推理计算模型研究

国家自然科学基金

1+阅读 · 2008年12月31日

A Study of Situational Reasoning for Traffic Understanding

Arxiv

0+阅读 · 2023年6月5日

Agency and legibility for artists through Experiential AI

Arxiv

0+阅读 · 2023年6月4日

AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap

Arxiv

0+阅读 · 2023年6月2日

Towards In-context Scene Understanding

Arxiv

0+阅读 · 2023年6月2日

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Arxiv

0+阅读 · 2023年6月2日

Benchmark dataset and instance generator for Real-World Three-Dimensional Bin Packing Problems

Arxiv

0+阅读 · 2023年6月2日

What-is and How-to for Fairness in Machine Learning: A Survey, Reflection, and Perspective

Arxiv

0+阅读 · 2023年6月2日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

25+阅读 · 2020年10月20日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

VIP会员

文章信息

相关主题

人工智能系统

相关VIP内容

多模态认知计算

多模态认知计算

专知会员服务

180+阅读 · 2022年9月16日

【nature machine intelligence】终身学习机器的生物基础，Biological underpinnings for lifelong learning machines

【nature machine intelligence】终身学习机器的生物基础，Biological underpinnings for lifelong learning machines

专知会员服务

38+阅读 · 2022年3月24日

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

【Nature Machine Intelligence】机器学习模型能否克服有偏置的数据集？哈佛、MIT专家为你解读

专知会员服务

31+阅读 · 2022年3月11日

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

专知会员服务

19+阅读 · 2022年3月1日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

【论文推荐】将机器语言模型扩展到人类级别的语言理解，Extending Machine Language Models toward Human-Level Language Understanding

专知会员服务

18+阅读 · 2019年12月14日

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

【AAAI2020论文】使用GANs生成科学文章的关键短语（Keyphrase Generation for Scientific Articles using GANs）

专知会员服务

22+阅读 · 2019年11月15日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

NeurIPS 2022 | 首个标注详细解释的多模态科学问答数据集，深度学习模型推理有了思维链

机器之心

1+阅读 · 2022年10月30日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

快来报名啦 | 图灵奖得主—— Joseph Sifakis明日重磅开讲

学术头条

0+阅读 · 2022年6月16日

重磅开讲：图灵奖得主—— Joseph Sifakis

重磅开讲：图灵奖得主—— Joseph Sifakis

THU数据派

0+阅读 · 2022年6月13日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Study of Situational Reasoning for Traffic Understanding

Arxiv

0+阅读 · 2023年6月5日

Agency and legibility for artists through Experiential AI

Arxiv

0+阅读 · 2023年6月4日

AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap

Arxiv

0+阅读 · 2023年6月2日

Towards In-context Scene Understanding

Arxiv

0+阅读 · 2023年6月2日

Towards Understanding Generalization of Macro-AUC in Multi-label Learning

Arxiv

0+阅读 · 2023年6月2日

Benchmark dataset and instance generator for Real-World Three-Dimensional Bin Packing Problems

Arxiv

0+阅读 · 2023年6月2日

What-is and How-to for Fairness in Machine Learning: A Survey, Reflection, and Perspective

Arxiv

0+阅读 · 2023年6月2日

Counterfactual Explanations for Machine Learning: A Review

Arxiv

25+阅读 · 2020年10月20日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Arxiv

16+阅读 · 2019年4月2日

相关基金

基于新型聚合物/无机杂化空穴传输材料的高效钙钛矿太阳能电池研究

国家自然科学基金

0+阅读 · 2015年12月31日

巨电流变液分散相界面结构及工作过程的原位SHINERS研究

国家自然科学基金

0+阅读 · 2014年12月31日

量子Ising模型中Kibble-Zurek机制的量子模拟实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

中国产石竹科无心菜属（Arenaria）的分类学研究

国家自然科学基金

0+阅读 · 2014年12月31日

金属氧化物界面的自旋极化电子输运研究

国家自然科学基金

0+阅读 · 2014年12月31日

Cu/Al复合带固-液铸轧电流强化复合成形技术基础研究

国家自然科学基金

0+阅读 · 2014年12月31日

下白垩统热河群鸟类化石形态和分类学研究

国家自然科学基金

0+阅读 · 2012年12月31日

半导体多晶薄膜中缺陷的形成与控制动力学

国家自然科学基金

0+阅读 · 2012年12月31日

氢协同掺杂大尺寸功能金刚石单晶的高压合成与性质研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于心理学归因理论的社会推理计算模型研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员