Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games - 专知论文

会员服务 ·

0

Performer · 有偏 · MoDELS · GPT-4o · 语言模型化 ·

2024 年 10 月 15 日

Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

翻译：暂无翻译

Nathan Herr,Fernando Acero,Roberta Raileanu,María Pérez-Ortiz,Zhibin Li

Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic decision-making abilities remain largely unexplored. To fully benefit from the potential of LLMs, it's essential to understand their ability to function in complex social scenarios. Game theory, which is already used to understand real-world interactions, provides a good framework for assessing these abilities. This work investigates the performance and merits of LLMs in canonical game-theoretic two-player non-zero-sum games, Stag Hunt and Prisoner Dilemma. Our structured evaluation of GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B shows that these models, when making decisions in these games, are affected by at least one of the following systematic biases: positional bias, payoff bias, or behavioural bias. This indicates that LLMs do not fully rely on logical reasoning when making these strategic decisions. As a result, it was found that the LLMs' performance drops when the game configuration is misaligned with the affecting biases. When misaligned, GPT-3.5, GPT-4-Turbo, GPT-4o, and Llama-3-8B show an average performance drop of 32\%, 25\%, 34\%, and 29\% respectively in Stag Hunt, and 28\%, 16\%, 34\%, and 24\% respectively in Prisoner's Dilemma. Surprisingly, GPT-4o (a top-performing LLM across standard benchmarks) suffers the most substantial performance drop, suggesting that newer models are not addressing these issues. Interestingly, we found that a commonly used method of improving the reasoning capabilities of LLMs, chain-of-thought (CoT) prompting, reduces the biases in GPT-3.5, GPT-4o, and Llama-3-8B but increases the effect of the bias in GPT-4-Turbo, indicating that CoT alone cannot fully serve as a robust solution to this problem. We perform several additional experiments, which provide further insight into these observed behaviours.

翻译：暂无翻译

0

相关内容

Performer

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Safe, Out-of-Distribution-Adaptive MPC with Conformalized Neural Network Ensembles

Safe, Out-of-Distribution-Adaptive MPC with Conformalized Neural Network Ensembles

Arxiv

0+阅读 · 2024年11月29日

Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning

Arxiv

0+阅读 · 2024年11月29日

An Extensive Evaluation of Factual Consistency in Large Language Models for Data-to-Text Generation

Arxiv

0+阅读 · 2024年11月28日

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

Arxiv

0+阅读 · 2024年11月28日

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Arxiv

37+阅读 · 2020年10月9日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向真实世界音视联合语音识别的可扩展框架

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

评估大语言模型在科学发现中的作用

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

相关论文

Safe, Out-of-Distribution-Adaptive MPC with Conformalized Neural Network Ensembles

Safe, Out-of-Distribution-Adaptive MPC with Conformalized Neural Network Ensembles

Arxiv

0+阅读 · 2024年11月29日

Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning

Arxiv

0+阅读 · 2024年11月29日

An Extensive Evaluation of Factual Consistency in Large Language Models for Data-to-Text Generation

Arxiv

0+阅读 · 2024年11月28日

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

Arxiv

0+阅读 · 2024年11月28日

Temporal Graph Networks for Deep Learning on Dynamic Graphs

Arxiv

37+阅读 · 2020年10月9日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员