In recent years, with the rapid development of sensing technology and the Internet of Things (IoT), sensors play increasingly important roles in traffic control, medical monitoring, industrial production and etc. They generated high volume of data in a streaming way that often need to be processed in real time. Therefore, streaming data computing technology plays an indispensable role in the real-time processing of sensor data in high throughput but low latency. In view of the above problems, the proposed framework is implemented on top of Spark Streaming, which builds up a gray model based traffic flow monitor, a traffic prediction orientated prediction layer and a fuzzy control based Batch Interval dynamic adjustment layer for Spark Streaming. It could forecast the variation of sensors data arrive rate, make streaming Batch Interval adjustment in advance and implement real-time streaming process by edge. Therefore, it can realize the monitor and prediction of the data flow changes of the autonomous driving vehicle sensor data in geographical coverage of edge computing node area, meanwhile minimize the end-to-end latency but satisfy the application throughput requirements. The experiments show that it can predict short-term traffic with no more than 4% relative error in a whole day. By making batch consuming rate close to data generating rate, it can maintain system stability well even when arrival data rate changes rapidly. The Batch Interval can be converged to a suitable value in two minutes when data arrival rate is doubled. Compared with vanilla version Spark Streaming, where there has serious task accumulation and introduces large delay, it can reduce 35% latency by squeezing Batch Interval when data arrival rate is low; it also can significantly improve system throughput by only at most 25% Batch Interval increase when data arrival rate is high.
翻译:近年来,随着遥感技术的迅速发展和物端互联网(IoT)的迅速发展,传感器在交通控制、医疗监测、工业生产等方面发挥着越来越重要的作用。它们以经常需要实时处理的流式方式生成大量数据,因此,数据计算技术在高吞吐量但低悬浮度的传感数据的实时处理中发挥着不可或缺的作用。鉴于上述问题,拟议框架在Spark 蒸汽流上实施,它建立了以灰色模型为基础的交通流量监测器、交通预测或定向预测层,以及基于批量间积累的烟雾控制Spark Streaming等。它们生成了大量数据流的动态调整层。因此,流数据计算高吞吐量的传动器数据流的实时处理中,因此,在精度计算节点的地理覆盖范围下,在降低终点到终点的通量时,可以降低延迟度,但通过量要求来满足应用程序。在Spark Stretary 中,它可以预测传感器到达速度变化的移动速度变化,在快速流数据流率中,在快速流数据流数据流中,通过整个循环数据流流流流流流流流流流中,它能够通过电流流流流流中可以大幅生成,在快速流流流流流流流流流流流到速度中可以持续,在通过时间里,在不断提高数据流流流数据流流流流流流到速度速度速度中可以大幅进行。