Practices in the built environment have become more digitalized with the rapid development of modern design and construction technologies. However, the requirement of practitioners or scholars to gather complicated professional knowledge in the built environment has not been satisfied yet. In this paper, more than 80,000 paper abstracts in the built environment field were obtained to build a knowledge graph, a knowledge base storing entities and their connective relations in a graph-structured data model. To ensure the retrieval accuracy of the entities and relations in the knowledge graph, two well-annotated datasets have been created, containing 2,000 instances and 1,450 instances each in 29 relations for the named entity recognition task and relation extraction task respectively. These two tasks were solved by two BERT-based models trained on the proposed dataset. Both models attained an accuracy above 85% on these two tasks. More than 200,000 high-quality relations and entities were obtained using these models to extract all abstract data. Finally, this knowledge graph is presented as a self-developed visualization system to reveal relations between various entities in the domain. Both the source code and the annotated dataset can be found here: https://github.com/HKUST-KnowComp/BEKG.
翻译:随着现代设计和建筑技术的迅速发展,建筑环境中的实践已变得更加数字化,然而,实践者或学者在建筑环境中收集复杂专业知识的要求尚未得到满足。在本文件中,建筑环境领域的80,000多份书面摘要已经获得,以建立一个知识图表,一个知识基础储存实体及其在图表结构化数据模型中的连接关系。为了确保实体检索的准确性和知识图中的关系,已经创建了两套有详细说明的数据集,分别包含2,000个实例和1,450个实例,29个关系中各有2,000个案例和1,450个案例,分别涉及指定实体的识别任务和相关的提取任务。这两个任务由两个以拟议数据集为培训的基于BERT的模型解决。这两个模型在这两个任务上都达到了85%的准确度。200,000多个高质量的关系和实体利用这些模型获取了所有抽象数据。最后,该知识图表作为一个自行开发的直观化系统,以显示领域不同实体之间的关系。这里可以找到源码和附加说明的数据集:https://github.com/Hust-KINBK/BEKG。