项目名称: 基于略图挖掘的在不同时空域的网络流式数据实时处理
项目编号: No.61502098
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 自动化技术、计算机技术
项目作者: 肖卿俊
作者单位: 东南大学
项目金额: 21万元
中文摘要: 近年来随着计算机网络和传感器技术的发展,人们从世界各个角落采集数据,带来了数据规模的飞速增长。为了满足人们对实时数据处理的需要,学者们提出了流式数据:它是数据的一个序列,并且要求在处理过程中任何元素只能读取一次,以及只能利用空间受限的高速存储介质。流式数据处理在实际系统得到广泛的应用,比如骨干网流量分析、无线电子标签监测、搜索引擎数据分析等等。然而,现有的算法主要关注单个数据流的实时分析,在海量数据流的并行处理以及多时空数据流的关联分析方面仍然不够深入和完善。为此,本课题拟对时空域的流式处理的关键技术和理论进行深入研究。在符合应用需求的前提下,设计一套分布式的流式数据实时处理机制及算法,包括海量数据流聚合信息挖掘、不同时间段的数据流的关联分析和模式识别、不同空间域的数据流处理节点的协同工作。基于上述阶段性研究成果,申请人将研制流式数据处理算法库,并实际验证所提出方法的正确性和有效性。
中文关键词: 流式数据处理;实时数据处理;网络流量分析;网络流量矩阵;异常网络行为
英文摘要: In recent decades, with the rapid development of sensor technology and computer networks, an unprecedented amount of data have been collected from all corners of the world. In order to realize the real-time processing of collected data, researchers have defined the concept of data stream, which is an ordered sequence of data items. For an arbitrary data stream, it is required to examine by only one pass, using only limited storage space for information encoding and decoding. Data stream technology, due to its high processing efficiency and low memory cost, has been adopted by numerous real-world applications, such as traffic measurement in high-speed networks and data compression for wireless sensor networks. However, existing streaming algorithms mainly focus on real-time mining of a single data stream. As far as we know, they are still inadequate with regards to simultaneous processing of multiple input streams, especially when streams are generated in different temporal and spatial domains. Therefore, the goal of our project is to develop techniques that enables the mining of multiple data streams (at different times and locations) for complex temporal and spatial semantic information. Specifically speaking, we will develop a package of algorithms that serves the following three purposes: (1) from a huge number of data streams, extract their aggregate knowledge efficiently; (2) analyze the correlation between data stream records at different time periods; (3) coordinate the functioning of stream processing engines at different locations, in order to construct summary for distributed streaming data. Benefiting from these achievements, we are going to build an algorithm library for distributed stream processing, and use real network data traces to verify the correctness and effectiveness of our proposed algorithms.
英文关键词: Stream Data Processing;Real Data Processing;Traffic Analysis;Network Flow Matrix;Abnormal Network Behavior