Recent efforts within the AI community have yielded impressive results towards "soft theorem proving" over natural language sentences using language models. We propose a novel, generative adversarial framework for probing and improving these models' reasoning capabilities. Adversarial attacks in this domain suffer from the logical inconsistency problem, whereby perturbations to the input may alter the label. Our Logically consistent AdVersarial Attacker, LAVA, addresses this by combining a structured generative process with a symbolic solver, guaranteeing logical consistency. Our framework successfully generates adversarial attacks and identifies global weaknesses common across multiple target models. Our analyses reveal naive heuristics and vulnerabilities in these models' reasoning capabilities, exposing an incomplete grasp of logical deduction under logic programs. Finally, in addition to effective probing of these models, we show that training on the generated samples improves the target model's performance.
翻译:AI社区内部最近的努力取得了令人印象深刻的成果,在使用语言模型对自然语言句子进行“软理论验证”方面,我们提出了用于检验和改进这些模型推理能力的新颖的基因对抗框架。这一领域的对抗性攻击存在逻辑不一致的问题,对输入的干扰可能改变标签。我们的AdVersarial Attacker,LAVA,在逻辑上一致的AdVeral攻击者,通过将结构化的基因化过程与象征性解析器相结合,保证逻辑一致性。我们的框架成功地产生了对抗性攻击,并查明了多重目标模型中共同的全球弱点。我们的分析揭示了这些模型推理能力中的天真超常和弱点,暴露了在逻辑程序下对逻辑推理的不完全理解。最后,除了有效验证这些模型外,我们还表明对所生成样本的培训提高了目标模型的性能。