查询大型语言模型的SQL方法 (Querying Large Language Models with SQL) - 专知论文

会员服务 ·

0

SQL · 大型语言模型 · 语言模型 · 数据库 · 提取 ·

2023 年 4 月 2 日

Querying Large Language Models with SQL

翻译：查询大型语言模型的SQL方法

Mohammed Saeed,Nicola De Cao,Paolo Papotti

In many use-cases, information is stored in text but not available in structured data. However, extracting data from natural language text to precisely fit a schema, and thus enable querying, is a challenging task. With the rise of pre-trained Large Language Models (LLMs), there is now an effective solution to store and use information extracted from massive corpora of text documents. Thus, we envision the use of SQL queries to cover a broad range of data that is not captured by traditional databases by tapping the information in LLMs. To ground this vision, we present Galois, a prototype based on a traditional database architecture, but with new physical operators for querying the underlying LLM. The main idea is to execute some operators of the the query plan with prompts that retrieve data from the LLM. For a large class of SQL queries, querying LLMs returns well structured relations, with encouraging qualitative results. Preliminary experimental results make pre-trained LLMs a promising addition to the field of database systems, introducing a new direction for hybrid query processing. However, we pinpoint several research challenges that must be addressed to build a DBMS that exploits LLMs. While some of these challenges necessitate integrating concepts from the NLP literature, others offer novel research avenues for the DB community.

翻译：在许多情况下，信息存储在文本中，但不可用于结构化数据。然而，从自然语言文本中提取数据以精确适应模式，并因此实现查询，是一项具有挑战性的任务。随着预训练的大型语言模型（LLM）的崛起，现在有一种有效的解决方案，用于存储和使用从大量文本文档中提取的信息。因此，我们设想使用SQL查询来涵盖传统数据库未捕获的广泛数据范围，通过利用LLM中的信息。为了落实这一愿景，我们展示了Galois，基于传统数据库架构的原型，但具有新的物理运算符，用于查询底层LLM。主要想法是使用提示执行查询计划的某些运算符，从LLM检索数据。对于大类SQL查询，查询LLM返回结构良好的关系，并具有令人鼓舞的定性结果。初步实验结果使预训练的LLM成为数据库系统领域的一个有前途的补充，引入了混合查询处理的新方向。然而，我们指出了必须解决的几个研究挑战，以构建利用LLM的DBMS。虽然其中一些挑战需要整合NLP文献的概念，但其他挑战则为DB社区提供了新的研究方向。

0

相关内容

SQL

SQL 全名是结构化查询语言，是用于数据库中的标准数据查询语言，IBM 公司最早使用在其开发的数据库系统中。

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

74+阅读 · 2020年7月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

139+阅读 · 2020年7月10日

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

专知会员服务

44+阅读 · 2020年5月3日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【异构图迁移的零样本学习】Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning

【异构图迁移的零样本学习】Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning

专知会员服务

66+阅读 · 2020年4月17日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

论文清单：一文梳理因果推理在自然语言处理中的应用

论文清单：一文梳理因果推理在自然语言处理中的应用

PaperWeekly

1+阅读 · 2022年9月7日

Java 近期新闻：Payara 平台、JReleaser、Quarkus、Hibernate和Spring Cloud

Java 近期新闻：Payara 平台、JReleaser、Quarkus、Hibernate和Spring Cloud

InfoQ

0+阅读 · 2022年7月13日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

“核HO-1”调控miRNA-125a-5p影响血脊髓屏障结构和功能的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

外包数据的密文存储及查询的关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

基于近似关键字的大规模空间数据查询与处理

国家自然科学基金

0+阅读 · 2012年12月31日

对象模型上交互式修复生成技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模序列数据集的压缩索引与搜索算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

云数据库查询模式集自动生成与检索关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于图的个人数据空间模型与查询方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于语言模型的通用实体检索建模及框架实现研究

国家自然科学基金

7+阅读 · 2011年12月31日

稀土膦卡宾配合物的合成及反应性能研究

国家自然科学基金

0+阅读 · 2008年12月31日

Active Learning Principles for In-Context Learning with Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Exploring Chain-of-Thought Style Prompting for Text-to-SQL

Arxiv

0+阅读 · 2023年5月23日

Active Prompting with Chain-of-Thought for Large Language Models

Arxiv

0+阅读 · 2023年5月23日

VideoLLM: Modeling Video Sequence with Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Text-to-SQL Error Correction with Language Models of Code

Arxiv

0+阅读 · 2023年5月22日

Evaluating and Enhancing Structural Understanding Capabilities of Large Language Models on Tables via Input Designs

Arxiv

0+阅读 · 2023年5月22日

Learning Horn Envelopes via Queries from Large Language Models

Arxiv

0+阅读 · 2023年5月20日

LLM-Pruner: On the Structural Pruning of Large Language Models

Arxiv

0+阅读 · 2023年5月19日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

VIP会员

文章信息

相关主题

大型语言模型

相关VIP内容

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

最新《知识图谱复杂问答》综述论文，A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges

专知会员服务

74+阅读 · 2020年7月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

最新《自然语言处理迁移学习》综述论文，A Survey on Transfer Learning in Natural Language Processing

专知会员服务

139+阅读 · 2020年7月10日

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

专知会员服务

44+阅读 · 2020年5月3日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【异构图迁移的零样本学习】Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning

【异构图迁移的零样本学习】Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning

专知会员服务

66+阅读 · 2020年4月17日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

论文清单：一文梳理因果推理在自然语言处理中的应用

论文清单：一文梳理因果推理在自然语言处理中的应用

PaperWeekly

1+阅读 · 2022年9月7日

Java 近期新闻：Payara 平台、JReleaser、Quarkus、Hibernate和Spring Cloud

Java 近期新闻：Payara 平台、JReleaser、Quarkus、Hibernate和Spring Cloud

InfoQ

0+阅读 · 2022年7月13日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Active Learning Principles for In-Context Learning with Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Exploring Chain-of-Thought Style Prompting for Text-to-SQL

Arxiv

0+阅读 · 2023年5月23日

Active Prompting with Chain-of-Thought for Large Language Models

Arxiv

0+阅读 · 2023年5月23日

VideoLLM: Modeling Video Sequence with Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Text-to-SQL Error Correction with Language Models of Code

Arxiv

0+阅读 · 2023年5月22日

Evaluating and Enhancing Structural Understanding Capabilities of Large Language Models on Tables via Input Designs

Arxiv

0+阅读 · 2023年5月22日

Learning Horn Envelopes via Queries from Large Language Models

Arxiv

0+阅读 · 2023年5月20日

LLM-Pruner: On the Structural Pruning of Large Language Models

Arxiv

0+阅读 · 2023年5月19日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

相关基金

“核HO-1”调控miRNA-125a-5p影响血脊髓屏障结构和功能的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

外包数据的密文存储及查询的关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

半导体衬底上FeSe薄膜的外延生长及界面超导

国家自然科学基金

0+阅读 · 2013年12月31日

基于近似关键字的大规模空间数据查询与处理

国家自然科学基金

0+阅读 · 2012年12月31日

对象模型上交互式修复生成技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模序列数据集的压缩索引与搜索算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

云数据库查询模式集自动生成与检索关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于图的个人数据空间模型与查询方法研究

国家自然科学基金

1+阅读 · 2011年12月31日

基于语言模型的通用实体检索建模及框架实现研究

国家自然科学基金

7+阅读 · 2011年12月31日

稀土膦卡宾配合物的合成及反应性能研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员