DBLP is the largest open-access repository of scientific articles on computer science and provides metadata associated with publications, authors, and venues. We retrieved more than 6 million publications from DBLP and extracted pertinent metadata (e.g., abstracts, author affiliations, citations) from the publication texts to create the DBLP Discovery Dataset (D3). D3 can be used to identify trends in research activity, productivity, focus, bias, accessibility, and impact of computer science research. We present an initial analysis focused on the volume of computer science research (e.g., number of papers, authors, research activity), trends in topics of interest, and citation patterns. Our findings show that computer science is a growing research field (approx. 15% annually), with an active and collaborative researcher community. While papers in recent years present more bibliographical entries in comparison to previous decades, the average number of citations has been declining. Investigating papers' abstracts reveals that recent topic trends are clearly reflected in D3. Finally, we list further applications of D3 and pose supplemental research questions. The D3 dataset, our findings, and source code are publicly available for research purposes.
翻译:DBLP是计算机科学科学科学文章的最大开放存取库,提供与出版物、作者和地点有关的元数据,我们从DBLP检索了600多万份出版物,并从出版物中提取了相关元数据(例如摘要、作者关系、引文),以创建DBLP Discovery数据集(D3),D3可用于确定研究活动、生产率、重点、偏见、可获取性和计算机科学研究影响方面的趋势。我们初步分析的重点是计算机科学研究的数量(例如论文数量、作者、研究活动)、有关专题的趋势和引用模式。我们的调查结果显示,计算机科学是一个日益增长的研究领域(每年约15%),是一个积极和协作的研究人员群体。近年来的论文显示,与过去几十年相比,书目条目的平均数量一直在下降。调查论文摘要表明,最近的专题趋势在D3中得到了明确反映。最后,我们列出了D3的进一步应用,并提出了补充研究问题。D3数据集、我们的调查结果和源代码可供公开用于研究目的。