In this work, we present to the NLP community, and to the wider research community as a whole, an application for the diachronic analysis of research corpora. We open source an easy-to-use tool coined: DRIFT, which allows researchers to track research trends and development over the years. The analysis methods are collated from well-cited research works, with a few of our own methods added for good measure. Succinctly put, some of the analysis methods are: keyword extraction, word clouds, predicting declining/stagnant/growing trends using Productivity, tracking bi-grams using Acceleration plots, finding the Semantic Drift of words, tracking trends using similarity, etc. To demonstrate the utility and efficacy of our tool, we perform a case study on the cs.CL corpus of the arXiv repository and draw inferences from the analysis methods. The toolkit and the associated code are available here: https://github.com/rajaswa/DRIFT.
翻译:在这项工作中,我们向国家实验室方案社区和整个研究界介绍了一项用于对研究公司进行日光分析的应用程序。我们打开了一个容易使用的工具源:DRIFT, 使研究人员能够跟踪这些年来的研究趋势和发展情况。分析方法从精心研究的研究工作中整理,并增加了我们自己的一些方法,以进行良好的计量。简略地说,一些分析方法有:关键词提取、字云、预测利用生产力的下降/停滞/增长趋势、利用加速地块跟踪双克码、查找词义图、利用相似性跟踪趋势等。为了展示我们工具的效用和效力,我们进行了关于ArXiv储存库的CLCS资料库的案例研究,并从分析方法中吸取了一些推论。工具包和相关代码可在这里查阅:https://github.com/rajaswa/DRIFT。