Semantic parsing is a technique aimed at constructing a structured representation of the meaning of a natural-language question. Recent advancements in few-shot language models trained on code have demonstrated superior performance in generating these representations compared to traditional unimodal language models, which are trained on downstream tasks. Despite these advancements, existing fine-tuned neural semantic parsers are susceptible to adversarial attacks on natural-language inputs. While it has been established that the robustness of smaller semantic parsers can be enhanced through adversarial training, this approach is not feasible for large language models in real-world scenarios, as it requires both substantial computational resources and expensive human annotation on in-domain semantic parsing data. This paper presents the first empirical study on the adversarial robustness of a large prompt-based language model of code, \codex. Our results demonstrate that the state-of-the-art (SOTA) code-language models are vulnerable to carefully crafted adversarial examples. To address this challenge, we propose methods for improving robustness without the need for significant amounts of labeled data or heavy computational resources.
翻译:语义分解是一种技术,旨在构建一种结构化的自然语言问题含义的表达方式。最近,在少数点语言模式中,经过守则培训的少数语言模型的进展表明,与传统单方式语言模型相比,在生成这些表述方式方面表现优异,而传统的单方式语言模型则是在下游任务方面受过培训。尽管取得了这些进展,但现有的微调神经语义分解器很容易受到对自然语言投入的对抗性攻击。虽然已经确定小型语义分析器的强健性可以通过对抗性培训得到加强,但在现实世界情景中,对于大型语言模型来说,这一方法并不可行,因为它既需要大量的计算资源,也需要对内地语义语义分解数据进行昂贵的人工批注。本文介绍了关于大型快速语言代码模型(\codex)的对抗性强健性的第一次实证研究。我们的结果表明,最先进的语义(SOTA)代码语言模型很容易被精心设计出对抗性实例。为了应对这一挑战,我们提出了改进稳健性的方法,不需要大量贴标签数据或沉重的计算资源。