The Software Engineering (SE) community is prolific, making it challenging for experts to keep up with the flood of new papers and for neophytes to enter the field. Therefore, we posit that the community may benefit from a tool extracting terms and their interrelations from the SE community's text corpus and showing terms' trends. In this paper, we build a prototyping tool using the word embedding technique. We train the embeddings on the SE Body of Knowledge handbook and 15,233 research papers' titles and abstracts. We also create test cases necessary for validation of the training of the embeddings. We provide representative examples showing that the embeddings may aid in summarizing terms and uncovering trends in the knowledge base.
翻译:软件工程(SE)社区是多产的,使得专家难以跟上新论文的泛滥,新植物也难以进入这个领域。因此,我们假定,社区可以从从SE社区文本材料中提取术语及其相互关系的工具中受益,并展示术语趋势。在本文中,我们用嵌入技术这个词来建立一个原型工具。我们培训SE知识集手册和15 233份研究论文标题和摘要中的嵌入内容。我们还创建了验证嵌入内容培训所必需的测试案例。我们提供了有代表性的例子,表明嵌入内容可能有助于总结术语和发现知识库的趋势。