Stack Overflow has been heavily used by software developers to seek programming-related information. More and more developers use Community Question and Answer forums, such as Stack Overflow, to search for code examples of how to accomplish a certain coding task. This is often considered to be more efficient than working from source documentation, tutorials or full worked examples. However, due to the complexity of these online Question and Answer forums and the very large volume of information they contain, developers can be overwhelmed by the sheer volume of available information. This makes it hard to find and/or even be aware of the most relevant code examples to meet their needs. To alleviate this issue, in this work we present a query-driven code recommendation tool, named Que2Code, that identifies the best code snippets for a user query from Stack Overflow posts. Our approach has two main stages: (i) semantically-equivalent question retrieval and (ii) best code snippet recommendation. To evaluate the performance of our proposed model, we conduct a large scale experiment to evaluate the effectiveness of the semantically-equivalent question retrieval task and best code snippet recommendation task separately on Python and Java datasets in Stack Overflow. We also perform a human study to measure how real-world developers perceive the results generated by our model. Both the automatic and human evaluation results demonstrate the promising performance of our model, and we have released our code and data to assist other researchers.
翻译:软件开发者大量使用“ 溢出” 软件开发者来寻求与编程有关的信息。 越来越多的开发者使用社区问答论坛, 如 Stack 溢出, 以寻找如何完成某种编码任务的代码示例。 这通常被认为比源文件、 辅导或完整工作实例的工作效率更高。 然而, 由于这些在线问答论坛的复杂性以及其中包含的大量信息, 开发者可能因为现有信息数量之大而不堪。 这就使得难以找到和/ 或甚至了解最相关的代码示例以满足其需要。 为了缓解这一问题, 我们在此工作中推出一个由查询驱动的代码建议工具, 名为 Que2Code, 用于确定来自 Stack 溢出员额用户查询的最佳代码片段。 我们的方法有两个主要阶段:(一) 语义上等量的问题检索,以及(二) 最佳代码缩略图建议。 为了评估我们拟议模型的性能, 我们进行了大规模实验, 评估了语义上等值的问题检索任务的有效性, 并且用最佳代码格式化了我们的数据流 。