Distributed networks and real-time systems are becoming the most important components for the new computer age, the Internet of Things (IoT), with huge data streams or data sets generated from sensors and data generated from existing legacy systems. The data generated offers the ability to measure, infer and understand environmental indicators, from delicate ecologies and natural resources to urban environments. This can be achieved through the analysis of the heterogeneous data sources (structured and unstructured). In this paper, we propose a distributed framework Event STream Processing Engine for Environmental Monitoring Domain (ESTemd) for the application of stream processing on heterogeneous environmental data. Our work in this area demonstrates the useful role big data techniques can play in an environmental decision support system, early warning and forecasting systems. The proposed framework addresses the challenges of data heterogeneity from heterogeneous systems and real time processing of huge environmental datasets through a publish/subscribe method via a unified data pipeline with the application of Apache Kafka for real time analytics.
翻译:分布式网络和实时系统正在成为新计算机时代最重要的组成部分,即物联网(IoT),由传感器和现有遗留系统生成的数据产生巨大的数据流或数据集,产生的数据能够测量、推断和理解环境指标,从微妙的生态学和自然资源到城市环境。这可以通过分析各种数据来源(结构化和无结构化)来实现。在本文件中,我们提议一个分布式的框架事件STream处理引擎用于环境监测域(ESTemd),以应用不同环境数据的流处理。我们在这一领域的工作表明,大数据技术在环境决策支持系统、预警和预报系统中可以发挥有益的作用。拟议框架通过统一的数据管道,通过应用阿帕奇·卡夫卡的实时分析技术,处理来自不同系统和实时处理大量环境数据集的数据异性的挑战。