We study the problem of decomposing a complex text-to-sql task into smaller sub-tasks and how such a decomposition can significantly improve the performance of Large Language Models (LLMs) in the reasoning process. There is currently a significant gap between the performance of fine-tuned models and prompting approaches using LLMs on challenging text-to-sql datasets such as Spider. We show that SQL queries, despite their declarative structure, can be broken down into sub-problems and the solutions of those sub-problems can be fed into LLMs to significantly improve their performance. Our experiments with three LLMs show that this approach consistently improves their performance by roughly 10%, pushing the accuracy of LLMs towards state-of-the-art, and even beating large fine-tuned models on the holdout Spider dataset.
翻译:我们研究了如何将复杂的文本到SQL任务分解为较小的子任务,并且这种分解如何显著提高大型语言模型(LLM)在推理过程中的性能。目前,针对具有挑战性的文本到SQL数据集(如Spider)进行微调模型和使用LLM的提示方法之间存在显着差距。我们展示了SQL查询,尽管具有声明性结构,但可以分解成子问题,并且这些子问题的解决方案可以馈入LLM中,从而显著提高其性能。我们使用三个LLM进行的实验表明,这种方法始终将性能提高了大约10%,将LLM的准确性推向了最先进水平,甚至在保留数据集Spider上击败了大型微调模型。