探索大型语言模型的状态追踪能力 (Exploring State Tracking Capabilities of Large Language Models) - 专知论文

会员服务 ·

0

语言模型 · 大语言模型 · 基准 · 分析 · 实体 ·

Exploring State Tracking Capabilities of Large Language Models

翻译：探索大型语言模型的状态追踪能力

Kiamehr Rezaee,Jose Camacho-Collados,Mohammad Taher Pilehvar

Large Language Models (LLMs) have demonstrated impressive capabilities in solving complex tasks, including those requiring a certain level of reasoning. In this paper, we focus on state tracking, a problem where models need to keep track of the state governing a number of entities. To isolate the state tracking component from other factors, we propose a benchmark based on three well-defined state tracking tasks and analyse the performance of LLMs in different scenarios. The results indicate that the recent generation of LLMs (specifically, GPT-4 and Llama3) are capable of tracking state, especially when integrated with mechanisms such as Chain of Thought. However, models from the former generation, while understanding the task and being able to solve it at the initial stages, often fail at this task after a certain number of steps.

翻译：大型语言模型（LLMs）在解决复杂任务方面展现出卓越的能力，包括需要一定推理水平的任务。本文聚焦于状态追踪问题，即模型需要持续追踪多个实体所遵循的状态。为将状态追踪要素与其他因素分离，我们基于三个明确定义的状态追踪任务提出一个基准，并分析LLMs在不同场景下的表现。结果表明，新一代LLMs（特别是GPT-4和Llama3）能够有效追踪状态，尤其是在结合思维链等机制时。然而，前代模型虽然能理解任务并在初始阶段解决问题，但在经过一定步骤后往往无法完成该任务。

0

相关内容

语言模型

从语言模型到语言智能体，普林斯顿Shunyu Yao

从语言模型到语言智能体，普林斯顿Shunyu Yao

专知会员服务

62+阅读 · 2023年9月18日

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

专知会员服务

12+阅读 · 2022年3月19日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

68+阅读 · 2020年2月25日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于非对称群体兴趣相关性并融合情境与群体信任的Web服务推荐研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

Resting Neurons, Active Insights: Improving Input Sparsification for Large Language Models

Arxiv

0+阅读 · 12月14日

Active Hypothesis Testing under Computational Budgets with Applications to GWAS and LLM

Arxiv

0+阅读 · 12月1日

DyFuLM: An Advanced Multimodal Framework for Sentiment Analysis

Arxiv

0+阅读 · 12月1日

LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and Throttling

Arxiv

0+阅读 · 11月26日

A Novel Data-Dependent Learning Paradigm for Large Hypothesis Classes

Arxiv

0+阅读 · 11月13日

VIP会员

文章信息

相关主题

大语言模型

相关VIP内容

从语言模型到语言智能体，普林斯顿Shunyu Yao

从语言模型到语言智能体，普林斯顿Shunyu Yao

专知会员服务

62+阅读 · 2023年9月18日

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

【CVPR 2022】基于视觉-语言验证和迭代推理的视觉定位,Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation

专知会员服务

12+阅读 · 2022年3月19日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

【CVPR2020-UBC】改进小样本学习视觉分类，Few-Shot Visual Classification

专知会员服务

68+阅读 · 2020年2月25日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向真实世界音视联合语音识别的可扩展框架

《通过仿真与开源数据提升战略决策：机遇与局限》最新报告

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

评估大语言模型在科学发现中的作用

相关资讯

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

相关论文

Resting Neurons, Active Insights: Improving Input Sparsification for Large Language Models

Arxiv

0+阅读 · 12月14日

Active Hypothesis Testing under Computational Budgets with Applications to GWAS and LLM

Arxiv

0+阅读 · 12月1日

DyFuLM: An Advanced Multimodal Framework for Sentiment Analysis

Arxiv

0+阅读 · 12月1日

LLaMCAT: Optimizing Large Language Model Inference with Cache Arbitration and Throttling

Arxiv

0+阅读 · 11月26日

A Novel Data-Dependent Learning Paradigm for Large Hypothesis Classes

Arxiv

0+阅读 · 11月13日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于非对称群体兴趣相关性并融合情境与群体信任的Web服务推荐研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于贝叶斯推理的模糊逻辑强化学习模型研究

国家自然科学基金

18+阅读 · 2012年12月31日

基于动态分层与自学习的多智能体自适应协作模型

国家自然科学基金

17+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员