ANALOGICAL - A New Benchmark for Analogy of Long Text for Large Language Models - 专知论文

会员服务 ·

0

语言模型化 · 可辨认的 · Taxonomy · 词向量表示 · MoDELS ·

2023 年 5 月 8 日

ANALOGICAL - A New Benchmark for Analogy of Long Text for Large Language Models

翻译：暂无翻译

Thilini Wijesiriwardene,Ruwan Wickramarachchi,Bimal G. Gajera,Shreeyash Mukul Gowaikar,Chandan Gupta,Aman Chadha,Aishwarya Naresh Reganti,Amit Sheth,Amitava Das

from arxiv, Accepted as a long paper at Findings of ACL 2023

Over the past decade, analogies, in the form of word-level analogies, have played a significant role as an intrinsic measure of evaluating the quality of word embedding methods such as word2vec. Modern large language models (LLMs), however, are primarily evaluated on extrinsic measures based on benchmarks such as GLUE and SuperGLUE, and there are only a few investigations on whether LLMs can draw analogies between long texts. In this paper, we present ANALOGICAL, a new benchmark to intrinsically evaluate LLMs across a taxonomy of analogies of long text with six levels of complexity -- (i) word, (ii) word vs. sentence, (iii) syntactic, (iv) negation, (v) entailment, and (vi) metaphor. Using thirteen datasets and three different distance measures, we evaluate the abilities of eight LLMs in identifying analogical pairs in the semantic vector space (e.g., "I can speak two languages" should be closer to "I am bilingual" while "I like chocolate" and "I do not like chocolate" should be orthogonal). Our evaluation finds that it is increasingly challenging for LLMs to identify analogies when going up the analogy taxonomy.

翻译：暂无翻译

0

相关内容

语言模型化

语言模型化

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

专知会员服务

85+阅读 · 2023年6月19日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

常微分方程中的几个经典问题

国家自然科学基金

2+阅读 · 2012年12月31日

非自治微分系统的Hartman-Grobman线性化及相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

人肝细胞特异性分泌III型干扰素的调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Levy扩散过程与非局部偏微分方程

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

考虑弥散尺度效应的裂隙介质中溶质运移模型及模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

转录因子ZNF580 在缺血-再灌注脑损伤中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Poincare-Nekhoroshev映射理论对非线性动力系统的应用

国家自然科学基金

0+阅读 · 2011年12月31日

A Survey on Multimodal Large Language Models

Arxiv

25+阅读 · 2023年6月23日

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Arxiv

0+阅读 · 2023年6月23日

A Natural Bias for Language Generation Models

Arxiv

0+阅读 · 2023年6月23日

Exploring Human-Like Translation Strategy with Large Language Models

Arxiv

0+阅读 · 2023年6月22日

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

Arxiv

0+阅读 · 2023年6月21日

Limits for Learning with Language Models

Arxiv

0+阅读 · 2023年6月21日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

VIP会员

文章信息

相关主题

语言模型化

词向量表示

相关VIP内容

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

CVPR 2023开会了！谷歌等最新《视觉上理解和解释注意力》教程，附152页ppt

专知会员服务

85+阅读 · 2023年6月19日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《在单一作战合成环境（SSE）中运用人工智能与大型语言模型以提供灵活人文地形及可信角色组》报告

《俄罗斯的未来战争方式第二部分：核威慑》报告

《提示战争：大语言模型如何决定军事干预》报告

《俄罗斯的未来战争方式第三部分：军事改革》报告

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

上百种预训练中文词向量：Chinese-Word-Vectors

上百种预训练中文词向量：Chinese-Word-Vectors

AINLP

23+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

A Survey on Multimodal Large Language Models

Arxiv

25+阅读 · 2023年6月23日

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Arxiv

0+阅读 · 2023年6月23日

A Natural Bias for Language Generation Models

Arxiv

0+阅读 · 2023年6月23日

Exploring Human-Like Translation Strategy with Large Language Models

Arxiv

0+阅读 · 2023年6月22日

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

Arxiv

0+阅读 · 2023年6月21日

Limits for Learning with Language Models

Arxiv

0+阅读 · 2023年6月21日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

A Survey of Machine Learning for Computer Architecture and Systems

Arxiv

18+阅读 · 2021年2月16日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

相关基金

膜蛋白介导受IRES调控的cyclin B1促进食管癌转移的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

地基InSAR高边坡三维变形提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

常微分方程中的几个经典问题

国家自然科学基金

2+阅读 · 2012年12月31日

非自治微分系统的Hartman-Grobman线性化及相关问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

人肝细胞特异性分泌III型干扰素的调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Levy扩散过程与非局部偏微分方程

国家自然科学基金

1+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

考虑弥散尺度效应的裂隙介质中溶质运移模型及模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

转录因子ZNF580 在缺血-再灌注脑损伤中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Poincare-Nekhoroshev映射理论对非线性动力系统的应用

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员