Given the vast number of repositories hosted on GitHub, project discovery and retrieval have become increasingly important for GitHub users. Repository descriptions serve as one of the first points of contact for users who are accessing a repository. However, repository owners often fail to provide a high-quality description; instead, they use vague terms, the purpose of the repository is poorly explained, or the description is omitted entirely. In this work, we examine the current practice of writing GitHub repository descriptions. Our investigation leads to the proposal of the LSP (Language, Software technology, and Purpose) template to formulate good descriptions for GitHub repositories that are clear, concise, and informative. To understand the extent to which current automated techniques can support generating repository descriptions, we compare the performance of state-of-the-art text summarization methods on this task. Finally, our user study with GitHub users reveals that automated summarization can adequately be used for default description generation for GitHub repositories, while the descriptions which follow the LSP template offer the most effective instrument for communicating with GitHub users.
翻译:鉴于GitHub存放的存储库数量庞大,项目发现和检索对于GitHub用户越来越重要,存储库描述是访问存储库用户的第一个接触点之一,然而,存储库所有者往往无法提供高质量的描述;相反,他们使用含糊的术语,存储库的目的解释不当,或完全省略描述;在这项工作中,我们研究了目前写 GitHub 存储库描述的做法。我们的调查导致LSP(语言、软件技术和目的)模板提议为 GitHub 存储库制定清晰、简洁和信息化的良好描述;为了解当前自动化技术能够支持生成存储库描述的程度,我们比较了这一任务的最新文本汇总方法的性能。最后,我们与GitHub 用户的用户研究表明,自动拼凑可以充分用于为 GitHub 存储库生成默认描述,而LSP 模板之后的描述提供了与 GitHub 用户进行沟通的最有效工具。