The rapid advancement of Large Language Models (LLMs) has driven significant progress in Natural Language Interface to Database (NLIDB). However, the widespread adoption of LLMs has raised critical privacy and security concerns. During interactions, LLMs may unintentionally expose confidential database contents or be manipulated by attackers to exfiltrate data through seemingly benign queries. While current efforts typically rely on rule-based heuristics or LLM agents to mitigate this leakage risk, these methods still struggle with complex inference-based attacks, suffer from high false positive rates, and often compromise the reliability of SQL queries. To address these challenges, we propose \textsc{SafeNlidb}, a novel privacy-security alignment framework for LLM-based NLIDB. The framework features an automated pipeline that generates hybrid chain-of-thought interaction data from scratch, seamlessly combining implicit security reasoning with SQL generation. Additionally, we introduce reasoning warm-up and alternating preference optimization to overcome the multi-preference oscillations of Direct Preference Optimization (DPO), enabling LLMs to produce security-aware SQL through fine-grained reasoning without the need for human-annotated preference data. Extensive experiments demonstrate that our method outperforms both larger-scale LLMs and ideal-setting baselines, achieving significant security improvements while preserving high utility.WARNING: This work may contain content that is offensive and harmful!
翻译:大型语言模型(LLMs)的快速发展推动了自然语言数据库接口(NLIDB)领域的显著进步。然而,LLMs的广泛应用引发了关键的隐私和安全问题。在交互过程中,LLMs可能无意中暴露机密数据库内容,或被攻击者操纵,通过看似良性的查询窃取数据。尽管当前的研究通常依赖基于规则的启发式方法或LLM代理来缓解此类泄露风险,但这些方法仍难以应对复杂的基于推理的攻击,存在较高的误报率,并且常常损害SQL查询的可靠性。为应对这些挑战,我们提出了\\textsc{SafeNlidb},一种面向基于LLM的NLIDB的新型隐私-安全对齐框架。该框架采用自动化流程,从零生成混合思维链交互数据,无缝整合隐式安全推理与SQL生成。此外,我们引入了推理预热和交替偏好优化,以克服直接偏好优化(DPO)的多偏好振荡问题,使LLMs能够通过细粒度推理生成具备安全意识的SQL,而无需人工标注的偏好数据。大量实验表明,我们的方法在安全性和实用性方面均优于更大规模的LLMs及理想设定基线,在保持高实用性的同时实现了显著的安全提升。警告:本工作可能包含具有冒犯性和有害的内容!