Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study - 专知论文

会员服务 ·

0

INFORMS · LSH · 样例 · 代码 · 自动问答 ·

2023 年 5 月 4 日

Improving Code Example Recommendations on Informal Documentation Using BERT and Query-Aware LSH: A Comparative Study

翻译：暂无翻译

Sajjad Rahmani,AmirHossein Naghshzan,Latifa Guerrouj

The study of code example recommendation has been conducted extensively in the past and recently in order to assist developers in their software development tasks. This is because developers often spend significant time searching for relevant code examples on the internet, utilizing open-source projects and informal documentation. For finding useful code examples, informal documentation, such as Stack Overflow discussions and forums, can be invaluable. We have focused our research on Stack Overflow, which is a popular resource for discussing different topics among software developers. For increasing the quality of the recommended code examples, we have collected and recommended the best code examples in the Java programming language. We have utilized BERT in our approach, which is a Large Language Model (LLM) for text representation that can effectively extract semantic information from textual data. Our first step involved using BERT to convert code examples into numerical vectors. Subsequently, we applied LSH to identify Approximate Nearest Neighbors (ANN). Our research involved the implementation of two variants of this approach, namely the Random Hyperplane-based LSH and the Query-Aware LSH. Our study compared two algorithms using four parameters: HitRate, Mean Reciprocal Rank (MRR), Average Execution Time, and Relevance. The results of our analysis revealed that the Query- Aware (QA) approach outperformed the Random Hyperplane-based (RH) approach in terms of HitRate. Specifically, the QA approach achieved a HitRate improvement of 20% to 35% for query pairs compared to the RH approach. Creating hashing tables and assigning data samples to buckets using the QA approach is at least four times faster than the RH approach. The QA approach returns code examples within milliseconds, while it takes several seconds (sec) for the RH approach to recommend code examples.

翻译：暂无翻译

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新六篇推荐系统相关论文—注意力机制、多任务、协同跨网络、非结构化文本、TransRev、章节推荐

【论文推荐】最新六篇推荐系统相关论文—注意力机制、多任务、协同跨网络、非结构化文本、TransRev、章节推荐

专知

12+阅读 · 2018年4月26日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

复多项式的核拓扑熵

国家自然科学基金

0+阅读 · 2015年12月31日

Kahler 曲面中特殊曲面的研究

国家自然科学基金

0+阅读 · 2014年12月31日

IL-35在动脉粥样硬化进程中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

量子群与Tewilliger代数的相关问题研究

国家自然科学基金

1+阅读 · 2013年12月31日

Beclin 1在阿尔茨海默病样神经元损伤中的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

几类半群在图论和形式语言学中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

基于代谢组学方法的羌活药材代谢表型及其与品质的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

过渡金属催化卤代芳烃对芳醛的Barbier类型反应研究

国家自然科学基金

0+阅读 · 2009年12月31日

Comparative Evaluation of Recent Universal Adversarial Perturbations in Image Classification

Arxiv

0+阅读 · 2023年6月20日

ArtFusion: Controllable Arbitrary Style Transfer using Dual Conditional Latent Diffusion Models

Arxiv

0+阅读 · 2023年6月19日

Constructions and bounds for codes with restricted overlaps

Arxiv

0+阅读 · 2023年6月19日

MOSPC: MOS Prediction Based on Pairwise Comparison

Arxiv

0+阅读 · 2023年6月18日

What is the state of the art? Accounting for multiplicity in machine learning benchmark performance

Arxiv

0+阅读 · 2023年6月17日

Adversaries with Limited Information in the Friedkin--Johnsen Model

Arxiv

0+阅读 · 2023年6月17日

Achilles' Heels: Vulnerable Record Identification in Synthetic Data Publishing

Arxiv

0+阅读 · 2023年6月17日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

Contextual and Position-Aware Factorization Machines for Sentiment Classification

Arxiv

13+阅读 · 2018年1月18日

Adversarial Learning for Chinese NER from Crowd Annotations

Arxiv

15+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新六篇推荐系统相关论文—注意力机制、多任务、协同跨网络、非结构化文本、TransRev、章节推荐

【论文推荐】最新六篇推荐系统相关论文—注意力机制、多任务、协同跨网络、非结构化文本、TransRev、章节推荐

专知

12+阅读 · 2018年4月26日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Comparative Evaluation of Recent Universal Adversarial Perturbations in Image Classification

Arxiv

0+阅读 · 2023年6月20日

ArtFusion: Controllable Arbitrary Style Transfer using Dual Conditional Latent Diffusion Models

Arxiv

0+阅读 · 2023年6月19日

Constructions and bounds for codes with restricted overlaps

Arxiv

0+阅读 · 2023年6月19日

MOSPC: MOS Prediction Based on Pairwise Comparison

Arxiv

0+阅读 · 2023年6月18日

What is the state of the art? Accounting for multiplicity in machine learning benchmark performance

Arxiv

0+阅读 · 2023年6月17日

Adversaries with Limited Information in the Friedkin--Johnsen Model

Arxiv

0+阅读 · 2023年6月17日

Achilles' Heels: Vulnerable Record Identification in Synthetic Data Publishing

Arxiv

0+阅读 · 2023年6月17日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

Contextual and Position-Aware Factorization Machines for Sentiment Classification

Arxiv

13+阅读 · 2018年1月18日

Adversarial Learning for Chinese NER from Crowd Annotations

Arxiv

15+阅读 · 2018年1月16日

相关基金

复多项式的核拓扑熵

国家自然科学基金

0+阅读 · 2015年12月31日

Kahler 曲面中特殊曲面的研究

国家自然科学基金

0+阅读 · 2014年12月31日

IL-35在动脉粥样硬化进程中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

量子群与Tewilliger代数的相关问题研究

国家自然科学基金

1+阅读 · 2013年12月31日

Beclin 1在阿尔茨海默病样神经元损伤中的调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

几类半群在图论和形式语言学中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

基于代谢组学方法的羌活药材代谢表型及其与品质的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

过渡金属催化卤代芳烃对芳醛的Barbier类型反应研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员