We collected and cleaned a large data set on publications in statistics. The data set consists of the coauthor relationships and citation relationships of 83, 331 papers published in 36 representative journals in statistics, probability, and machine learning, spanning 41 years. The data set allows us to construct many different networks, and motivates a number of research problems about the research patterns and trends, research impacts, and network topology of the statistics community. In this paper we focus on (i) using the citation relationships to estimate the research interests of authors, and (ii) using the coauthor relationships to study the network topology. Using co-citation networks we constructed, we discover a "statistics triangle", reminiscent of the statistical philosophy triangle (Efron, 1998). We propose new approaches to constructing the "research map" of statisticians, as well as the "research trajectory" for a given author to visualize his/her research interest evolvement. Using co-authorship networks we constructed, we discover a multi-layer community tree and produce a Sankey diagram to visualize the author migrations in different sub-areas. We also propose several new metrics for research diversity of individual authors. We find that "Bayes", "Biostatistics", and "Nonparametric" are three primary areas in statistics. We also identify 15 sub-areas, each of which can be viewed as a weighted average of the primary areas, and identify several underlying reasons for the formation of co-authorship communities. We also find that the research interests of statisticians have evolved significantly in the 41-year time window we studied: some areas (e.g., biostatistics, high-dimensional data analysis, etc.) have become increasingly more popular.
翻译:我们收集并清理了有关统计出版物的大量数据。数据集包括83,331篇论文,在36个具有代表性的期刊上发表,共作者关系和引证关系,共83,331篇论文,涉及统计、概率和机器学习,共41年。数据集使我们能够建立许多不同的网络,激发关于统计界研究模式和趋势、研究影响和网络地形的若干研究问题。在本文中,我们侧重于(一) 利用引用关系来估计作者的研究兴趣,以及(二) 利用共同作者关系来研究网络表层学。我们利用共同引用网络,发现“统计变异”的“统计三角 ”, 与统计学三角(Efron,1998年)。我们提出了构建统计学家“研究地图”的新办法,以及“研究轨迹”,让一位作者能够直观其研究兴趣演变。我们利用共同授权网络,发现了多层社区树,并制作了“Sankey图表”将作者迁移到不同的子区域。我们还提出了“数据三角”的新的指标,用于研究多样化的每个研究领域。我们发现“研究领域”的每个研究领域中的“我们发现一个“基础” 。我们发现三个研究领域中的“平均” 。