In this paper, we present an automatic knowledge base construction system from large scale enterprise documents with minimal efforts of human intervention. In the design and deployment of such a knowledge mining system for enterprise, we faced several challenges including data distributional shift, performance evaluation, compliance requirements and other practical issues. We leveraged state-of-the-art deep learning models to extract information (named entities and definitions) at per document level, then further applied classical machine learning techniques to process global statistical information to improve the knowledge base. Experimental results are reported on actual enterprise documents. This system is currently serving as part of a Microsoft 365 service.
翻译:在本文中,我们从大型企业文件中提出自动知识基础建设系统,尽量不进行人力干预;在设计和部署企业知识采矿系统时,我们面临若干挑战,包括数据分配转移、绩效评价、合规要求和其他实际问题;我们利用最先进的深层次学习模型,在文件一级提取信息(名称实体和定义),然后进一步运用经典机器学习技术,处理全球统计信息,以改善知识库;在实际企业文件中报告实验结果;该系统目前是微软365服务的一部分。