The amount of scholarly data has been increasing dramatically over the last years. For newcomers to a particular science domain (e.g., IR, physics, NLP) it is often difficult to spot larger trends and to position the latest research in the context of prior scientific achievements and breakthroughs. Similarly, researchers in the history of science are interested in tools that allow them to analyze and visualize changes in particular scientific domains. Temporal summarization and related methods should be then useful for making sense of large volumes of scientific discourse data aggregated over time. We demonstrate a novel approach to analyze the collections of research papers published over longer time periods to provide a high-level overview of important semantic changes that occurred over the progress of time. Our approach is based on comparing word semantic representations over time and aims to support users in a better understanding of large domain-focused archives of scholarly publications. As an example dataset we use the ACL Anthology Reference Corpus that spans from 1979 to 2015 and contains 22,878 scholarly articles.
翻译:过去几年来,学术数据的数量一直在急剧增加,对于进入特定科学领域(如IR、物理、NLP)的新人来说,往往难以发现较大的趋势,难以将最新研究置于以往科学成就和突破的背景下,同样,科学史上的研究人员对有助于他们分析和直观特定科学领域变化的工具感兴趣。时间总和和相关方法应当有助于了解长期积累的大量科学话语数据。我们展示了一种新颖的方法来分析长期出版的研究论文的汇编,以便高层次地概述随着时间的推移而出现的重要语义变化。我们的方法是比较长期的文字语义表达,目的是支持用户更好地了解大量以领域为重点的学术出版物档案。我们使用1979年至2015年的ACLA Anthology Conference Corporus(1979年至2015年)为一例,包含22,878份学术文章。