自动化常用理由基准:调查 (Benchmarks for Automated Commonsense Reasoning: A Survey)

More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed and many aspects of common sense remain untested. Consequently, we do not currently have any reliable way of measuring to what extent existing AI systems have achieved these abilities. This paper surveys the development and uses of AI commonsense benchmarks. We discuss the nature of common sense; the role of common sense in AI; the goals served by constructing commonsense benchmarks; and desirable features of commonsense benchmarks. We analyze the common flaws in benchmarks, and we argue that it is worthwhile to invest the work needed ensure that benchmark examples are consistently high quality. We survey the various methods of constructing commonsense benchmarks. We enumerate 139 commonsense benchmarks that have been developed: 102 text-based, 18 image-based, 12 video based, and 7 simulated physical environments. We discuss the gaps in the existing benchmarks and aspects of commonsense reasoning that are not addressed in any existing benchmark. We conclude with a number of recommendations for future development of commonsense AI benchmarks.

翻译：为检验人工智能系统的常识知识和常识推理能力,已经制定了一百多个基准,但是,这些基准往往有缺陷,常识的许多方面仍未测试,因此,我们目前没有任何可靠的方法来衡量现有的AI系统在多大程度上实现了这些能力。本文调查了AI常识基准的发展和使用情况。我们讨论了常识的性质;常识在AI中的作用;构建常识基准所服务的目标;常识基准的可取特征。我们分析了基准中的共同缺陷,我们认为,有必要投入必要的工作,以确保基准范例始终具有很高的质量。我们调查了建立常识基准的各种方法。我们列举了已经制定的139个常识基准:102个基于文本、18个基于图像、12个基于视频和7个模拟的物理环境。我们讨论了现有基准中的差距和常识推理中任何现有基准中未涉及的空白。我们最后提出了今后制定常识独立智能基准的若干建议。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日