代码QA:源代码理解问答数据集 (CodeQA: A Question Answering Dataset for Source Code Comprehension) - 专知论文

会员服务 ·

0

自动问答 · 数据集 · 语义分析 · Processing（编程语言） · 机器阅读理解 ·

2021 年 9 月 17 日

CodeQA: A Question Answering Dataset for Source Code Comprehension

翻译：代码QA:源代码理解问答数据集

Chenxiao Liu,Xiaojun Wan

from arxiv, Findings of EMNLP 2021

We propose CodeQA, a free-form question answering dataset for the purpose of source code comprehension: given a code snippet and a question, a textual answer is required to be generated. CodeQA contains a Java dataset with 119,778 question-answer pairs and a Python dataset with 70,085 question-answer pairs. To obtain natural and faithful questions and answers, we implement syntactic rules and semantic analysis to transform code comments into question-answer pairs. We present the construction process and conduct systematic analysis of our dataset. Experiment results achieved by several neural baselines on our dataset are shown and discussed. While research on question-answering and machine reading comprehension develops rapidly, few prior work has drawn attention to code question answering. This new dataset can serve as a useful research benchmark for source code comprehension.

翻译：我们提出代码QA,这是一个自由形式回答问题的数据,用于源代码理解:如果有一个代码片断和一个问题,则需要生成文本回答。代码QA包含一个包含119,778个问答配对的爪哇数据集和一个包含70,085个问答配对的Python数据集。为了获得自然和忠实的问答,我们实施了合成规则和语义分析,将代码评论转换成问答配对。我们介绍了构建过程,并对我们的数据集进行了系统分析。展示和讨论了我们数据集的若干神经基线所取得的实验结果。虽然关于问答和机器阅读理解的研究工作迅速发展,但很少有先前的工作提请注意代码回答。这个新的数据集可以作为源代码理解的有用研究基准。

0

相关内容

自动问答

自动问答（Question Answering, QA）是指利用计算机自动回答用户所提出的问题以满足用户知识需求的任务。不同于现有搜索引擎，问答系统是信息服务的一种高级形式，系统返回用户的不再是基于关键词匹配排序的文档列表，而是精准的自然语言答案。近年来，随着人工智能的飞速发展，自动问答已经成为倍受关注且发展前景广泛的研究方向。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

近期必读的七篇AAAI 2021【问答（QA）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

197+阅读 · 2020年2月1日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【资源】问答阅读理解资源列表

【资源】问答阅读理解资源列表

专知

3+阅读 · 2020年7月25日

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

已删除

雪球

6+阅读 · 2018年8月19日

开源代码上新！6 份最新「Paper + Code」 | PaperDaily #17

开源代码上新！6 份最新「Paper + Code」 | PaperDaily #17

PaperWeekly

5+阅读 · 2017年11月23日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】自动特征工程开源框架

【推荐】自动特征工程开源框架

机器学习研究会

17+阅读 · 2017年11月7日

【数据集】新的YELP数据集官方下载

【数据集】新的YELP数据集官方下载

机器学习研究会

16+阅读 · 2017年8月31日

WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Arxiv

0+阅读 · 2021年11月8日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test

Arxiv

4+阅读 · 2019年9月8日

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Sogou Machine Reading Comprehension Toolkit

Arxiv

8+阅读 · 2019年3月28日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

Adversarial TableQA: Attention Supervision for Question Answering on Tables

Arxiv

4+阅读 · 2018年10月18日

Improving Question Answering by Commonsense-Based Pre-Training

Arxiv

5+阅读 · 2018年10月5日

Knowledge Based Machine Reading Comprehension

Knowledge Based Machine Reading Comprehension

Arxiv

4+阅读 · 2018年9月12日

QuAC : Question Answering in Context

QuAC : Question Answering in Context

Arxiv

4+阅读 · 2018年8月21日

VIP会员

文章信息

相关主题

Processing（编程语言）

机器阅读理解

相关VIP内容

近期必读的七篇AAAI 2021【问答（QA）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

197+阅读 · 2020年2月1日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

【资源】问答阅读理解资源列表

【资源】问答阅读理解资源列表

专知

3+阅读 · 2020年7月25日

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

已删除

雪球

6+阅读 · 2018年8月19日

开源代码上新！6 份最新「Paper + Code」 | PaperDaily #17

开源代码上新！6 份最新「Paper + Code」 | PaperDaily #17

PaperWeekly

5+阅读 · 2017年11月23日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】自动特征工程开源框架

【推荐】自动特征工程开源框架

机器学习研究会

17+阅读 · 2017年11月7日

【数据集】新的YELP数据集官方下载

【数据集】新的YELP数据集官方下载

机器学习研究会

16+阅读 · 2017年8月31日

相关论文

WebSRC: A Dataset for Web-Based Structural Reading Comprehension

Arxiv

0+阅读 · 2021年11月8日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

Commonsense Knowledge + BERT for Level 2 Reading Comprehension Ability Test

Arxiv

4+阅读 · 2019年9月8日

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Sogou Machine Reading Comprehension Toolkit

Arxiv

8+阅读 · 2019年3月28日

Visual Question Answering as Reading Comprehension

Arxiv

3+阅读 · 2018年11月29日

Adversarial TableQA: Attention Supervision for Question Answering on Tables

Arxiv

4+阅读 · 2018年10月18日

Improving Question Answering by Commonsense-Based Pre-Training

Arxiv

5+阅读 · 2018年10月5日

Knowledge Based Machine Reading Comprehension

Knowledge Based Machine Reading Comprehension

Arxiv

4+阅读 · 2018年9月12日

QuAC : Question Answering in Context

QuAC : Question Answering in Context

Arxiv

4+阅读 · 2018年8月21日

微信扫码咨询专知VIP会员