面向AI编程助手的开发者提供上下文在开源项目中的实证研究 (An Empirical Study of Developer-Provided Context for AI Coding Assistants in Open-Source Projects)

While Large Language Models (LLMs) have demonstrated remarkable capabilities, research shows that their effectiveness depends not only on explicit prompts but also on the broader context provided. This requirement is especially pronounced in software engineering, where the goals, architecture, and collaborative conventions of an existing project play critical roles in response quality. To support this, many AI coding assistants have introduced ways for developers to author persistent, machine-readable directives that encode a project's unique constraints. Although this practice is growing, the content of these directives remains unstudied. This paper presents a large-scale empirical study to characterize this emerging form of developer-provided context. Through a qualitative analysis of 401 open-source repositories containing cursor rules, we developed a comprehensive taxonomy of project context that developers consider essential, organized into five high-level themes: Conventions, Guidelines, Project Information, LLM Directives, and Examples. Our study also explores how this context varies across different project types and programming languages, offering implications for the next generation of context-aware AI developer tools.

翻译：尽管大型语言模型（LLM）已展现出卓越的能力，但研究表明其有效性不仅取决于显式提示，还依赖于所提供的更广泛上下文。这一要求在软件工程领域尤为突出，其中现有项目的目标、架构和协作惯例对响应质量起着关键作用。为此，许多AI编程助手引入了允许开发者编写持久化、机器可读指令的方法，以编码项目的独特约束。尽管这一实践正在增长，但这些指令的内容尚未得到研究。本文通过一项大规模实证研究，对这种新兴的开发者提供上下文的形式进行了特征刻画。通过对包含cursor规则的401个开源仓库进行定性分析，我们构建了一个全面的项目上下文分类体系，涵盖开发者认为至关重要的内容，并将其组织为五个高层主题：惯例、指南、项目信息、LLM指令和示例。我们的研究还探讨了这种上下文如何随不同项目类型和编程语言而变化，为下一代上下文感知的AI开发者工具提供了启示。