Keeping up with the research literature plays an important role in the workflow of scientists - allowing them to understand a field, formulate the problems they focus on, and develop the solutions that they contribute, which in turn shape the nature of the discipline. In this paper, we examine the literature review practices of data scientists. Data science represents a field seeing an exponential rise in papers, and increasingly drawing on and being applied in numerous diverse disciplines. Recent efforts have seen the development of several tools intended to help data scientists cope with a deluge of research and coordinated efforts to develop AI tools intended to uncover the research frontier. Despite these trends indicative of the information overload faced by data scientists, no prior work has examined the specific practices and challenges faced by these scientists in an interdisciplinary field with evolving scholarly norms. In this paper, we close this gap through a set of semi-structured interviews and think-aloud protocols of industry and academic data scientists (N = 20). Our results while corroborating other knowledge workers' practices uncover several novel findings: individuals (1) are challenged in seeking and sensemaking of papers beyond their disciplinary bubbles, (2) struggle to understand papers in the face of missing details and mathematical content, (3) grapple with the deluge by leveraging the knowledge context in code, blogs, and talks, and (4) lean on their peers online and in-person. Furthermore, we outline future directions likely to help data scientists cope with the burgeoning research literature.
翻译:与研究文献保持同步在科学家的工作流程中发挥着重要的作用,使科学家们能够理解一个领域,制定他们所关注的问题,并制定他们所贡献的解决方案,这反过来又决定了学科的性质。在本论文中,我们审查了数据科学家的实践。数据科学代表了一个领域,看到论文数量急剧上升,并越来越多地吸收和应用于多种学科。最近的努力看到开发了若干工具,以帮助数据科学家应付大量研究和协调努力开发旨在发现研究前沿的AI工具。尽管这些趋势表明数据科学家所面临的信息过量,但没有以前的工作对这些科学家在跨学科领域面对不断发展的学术规范所面临的具体做法和挑战进行审查。在本论文中,我们通过一组半结构化的访谈和产业和学术数据科学家的智商协议(N=20)。我们的结果证实了其他知识工作者的做法,发现了一些新发现:个人(1)在寻找和感知旨在发现其学科泡沫之外的论文时受到挑战,但(2)在面对缺失的细节和数学内容时,很难理解这些科学家们所面临的具体做法和挑战。 在本文件中,我们通过一系列半结构化的访谈和学术和学术研究大纲来弥补这一差距。此外,我们可能通过利用在线的理论和数据走向,在研究中学习中学习。