Mentorship in science is crucial for topic choice, career decisions, and the success of mentees and mentors. Typically, researchers who study mentorship use article co-authorship and doctoral dissertation datasets. However, available datasets of this type focus on narrow selections of fields and miss out on early career and non-publication-related interactions. Here, we describe MENTORSHIP, a crowdsourced dataset of 743176 mentorship relationships among 738989 scientists across 112 fields that avoids these shortcomings. We enrich the scientists' profiles with publication data from the Microsoft Academic Graph and "semantic" representations of research using deep learning content analysis. Because gender and race have become critical dimensions when analyzing mentorship and disparities in science, we also provide estimations of these factors. We perform extensive validations of the profile--publication matching, semantic content, and demographic inferences. We anticipate this dataset will spur the study of mentorship in science and deepen our understanding of its role in scientists' career outcomes.
翻译:科学指导对主题选择、职业决定和修饰人和导师的成功至关重要。通常,研究导师的研究人员使用文章共同作者和博士论文数据集。然而,这种类型的现有数据集侧重于狭隘的领域选择,错过早期职业和非出版相关互动。这里我们描述738176个来自112个领域的科学家之间的73889899个导师关系的群集数据,避免了这些缺陷。我们用微软学术图表中的出版数据以及利用深层学习内容分析研究的“精度”表示来丰富科学家的概况。因为在分析导师和科学差异时,性别和种族已成为关键层面,我们还提供了这些因素的估计。我们对剖析-出版匹配、语义内容和人口推断进行了广泛的验证。我们预计这一数据集将激励对科学导师的研究,并加深我们对科学家职业成果作用的理解。