基于结构化思维链的文本到SQL知识蒸馏 (Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL)

Deploying accurate Text-to-SQL systems at the enterprise level faces a difficult trilemma involving cost, security and performance. Current solutions force enterprises to choose between expensive, proprietary Large Language Models (LLMs) and low-performing Small Language Models (SLMs). Efforts to improve SLMs often rely on distilling reasoning from large LLMs using unstructured Chain-of-Thought (CoT) traces, a process that remains inherently ambiguous. Instead, we hypothesize that a formal, structured reasoning representation provides a clearer, more reliable teaching signal, as the Text-to-SQL task requires explicit and precise logical steps. To evaluate this hypothesis, we propose Struct-SQL, a novel Knowledge Distillation (KD) framework that trains an SLM to emulate a powerful large LLM. Consequently, we adopt a query execution plan as a formal blueprint to derive this structured reasoning. Our SLM, distilled with structured CoT, achieves an absolute improvement of 8.1% over an unstructured CoT distillation baseline. A detailed error analysis reveals that a key factor in this gain is a marked reduction in syntactic errors. This demonstrates that teaching a model to reason using a structured logical blueprint is beneficial for reliable SQL generation in SLMs.

翻译：在企业层面部署精确的文本到SQL系统面临着一个涉及成本、安全性和性能的困难三重困境。当前的解决方案迫使企业在昂贵的专有大型语言模型和性能低下的小型语言模型之间做出选择。改进小型语言模型的努力通常依赖于使用非结构化思维链轨迹从大型语言模型中蒸馏推理能力，这一过程本质上仍然存在模糊性。相反，我们假设一种形式化的、结构化的推理表征能提供更清晰、更可靠的教学信号，因为文本到SQL任务需要明确且精确的逻辑步骤。为了验证这一假设，我们提出了Struct-SQL，一个新颖的知识蒸馏框架，用于训练一个小型语言模型来模拟一个强大的大型语言模型。因此，我们采用查询执行计划作为形式化蓝图来推导这种结构化推理。我们通过结构化思维链蒸馏得到的小型语言模型，相对于非结构化思维链蒸馏基线，实现了8.1%的绝对性能提升。详细的错误分析表明，这一增益的关键因素在于句法错误显著减少。这证明了教导模型使用结构化逻辑蓝图进行推理，对于在小型语言模型中可靠地生成SQL是有益的。