人工智能还是人工技巧？LLMs在编程中是否规避规则？ (Artificial or Just Artful? Do LLMs Bend the Rules in Programming?)

Large Language Models (LLMs) are widely used for automated code generation, yet their apparent successes often mask a tension between pretraining objectives and alignment choices. While pretraining encourages models to exploit all available signals to maximize success, alignment, whether through fine-tuning or prompting, may restrict their use. This conflict is especially salient in agentic AI settings, for instance when an agent has access to unit tests that, although intended for validation, act as strong contextual signals that can be leveraged regardless of explicit prohibitions. In this paper, we investigate how LLMs adapt their code generation strategies when exposed to test cases under different prompting conditions. Using the BigCodeBench (Hard) dataset, we design five prompting conditions that manipulate test visibility and impose explicit or implicit restrictions on their use. We evaluate five LLMs (four open-source and one closed-source) across correctness, code similarity, program size, and code churn, and analyze cross-model consistency to identify recurring adaptation strategies. Our results show that test visibility dramatically alters performance, correctness nearly doubles for some models, while explicit restrictions or partial exposure only partially mitigate this effect. Beyond raw performance, we identify four recurring adaptation strategies, with test-driven refinement emerging as the most frequent. These results highlight how LLMs adapt their behavior when exposed to contextual signals that conflict with explicit instructions, providing useful insight into how models reconcile pretraining objectives with alignment constraints.

翻译：大型语言模型（LLMs）被广泛用于自动化代码生成，但其表面上的成功往往掩盖了预训练目标与对齐选择之间的张力。虽然预训练鼓励模型利用所有可用信号以最大化成功率，但通过对齐（无论是通过微调还是提示）可能会限制其使用。这种冲突在智能体AI场景中尤为突出，例如当智能体拥有单元测试时——这些测试虽旨在用于验证，却可作为强大的上下文信号被利用，即使存在明确的禁止指令。本文研究了LLMs在不同提示条件下暴露于测试用例时如何调整其代码生成策略。基于BigCodeBench（Hard）数据集，我们设计了五种提示条件，通过控制测试可见性并施加显性或隐性使用限制来操纵实验设置。我们评估了五个LLMs（四个开源模型和一个闭源模型）在正确性、代码相似性、程序规模和代码变更率方面的表现，并通过跨模型一致性分析识别反复出现的适应策略。实验结果表明：测试可见性显著改变模型性能，部分模型的正确率提升近一倍；而显性限制或部分暴露仅能部分缓解这种效应。除原始性能外，我们识别出四种反复出现的适应策略，其中测试驱动优化策略最为普遍。这些结果揭示了LLMs在面临与显式指令冲突的上下文信号时如何调整行为，为理解模型如何协调预训练目标与对齐约束提供了重要见解。