Energy research is of crucial public importance but the use of computer science technologies like automatic text processing and data management for the energy domain is still rare. Employing these technologies in the energy domain will be a significant contribution to the interdisciplinary topic of ``energy informatics", just like the related progress within the interdisciplinary area of ``bioinformatics". In this paper, we present the architecture of a Web-based semantic system called EneMonIE (Energy Monitoring through Information Extraction) for monitoring up-to-date energy trends through the use of automatic, continuous, and guided information extraction from diverse types of media available on the Web. The types of media handled by the system will include online news articles, social media texts, online news videos, and open-access scholarly papers and technical reports as well as various numeric energy data made publicly available by energy organizations. The system will utilize and contribute to the energy-related ontologies and its ultimate form will comprise components for (i) text categorization, (ii) named entity recognition, (iii) temporal expression extraction, (iv) event extraction, (v) social network construction, (vi) sentiment analysis, (vii) information fusion and summarization, (viii) media interlinking, and (ix) Web-based information retrieval and visualization. Wits its diverse data sources, automatic text processing capabilities, and presentation facilities open for public use; EneMonIE will be an important source of distilled and concise information for decision-makers including energy generation, transmission, and distribution system operators, energy research centres, related investors and entrepreneurs as well as for academicians, students, other individuals interested in the pace of energy events and technologies.
翻译:能源研究具有至关重要的公共意义,但能源领域自动文本处理和数据管理等计算机科学技术的使用仍然很少。在能源领域使用这些技术将对“能源信息学”这个跨学科主题做出重要贡献。正如在“生物信息学”这一跨学科领域取得的相关进展一样。 在本文中,我们介绍一个名为EneMonie(通过信息提取进行能源监测)的网络简洁语系统的结构,该系统通过使用自动、连续和有指导地从网上提供的各类媒体提取信息来监测最新的能源趋势。该系统处理的媒体类型将包括在线新闻文章、社交媒体文本、在线新闻录像、公开获取的学术论文和技术报告以及能源组织公开提供的各种数字能源数据。 该系统将利用和推动与能源有关的统计及其最终形式,包括:(一) 文本分类、(二) 名称实体识别、(三) 时间表达、(四) 事件提取、(五) 社会网络建设、在线文本、(六) 信息传输和图像分析,以及(七) 数据库和图像分析,以及(七) 数据库中的重要文本和图像分析(七) 数据库和图像分析,供学生使用。