The experiment data generated by the EAST device is getting larger and larger, and it is necessary to monitor the MDSplus data storage server on EAST. In order to facilitate the management of users on the MDSplus server, a real-time monitoring log analysis system is needed. The data processing framework adopted by this log analysis system is the Spark Streaming framework in Spark ecosphere, whose real-time streaming data is derived from MDSplus logs. The framework also makes use of key technologies such as log monitoring, aggregation and distribution with framework likes Flume and Kafka which makes it possible for MDSplus mass log data processing power. The system can process tens of millions of unprocessed MDSplus log information at a second level, then model the log information and display it on the web. This report introduces the design and implementation of the overall architecture of real time data access log analysis system based on spark. Experimental results show that the system is proved to be with steady and reliable performance and has an important application value to the management of fusion experiment data. The system has been designed and will be adopted in the next campaign and the system details will be given in the paper.
翻译:东部设备产生的实验数据正在扩大,并且有必要对东部的MDSplus数据储存服务器进行监测。为了便利管理MDSplus服务器上的用户,需要实时监测日志分析系统。这个日志分析系统采用的数据处理框架是Spark Eclom的Spark Streaming框架,其实时流数据来自MDSplus日志。这个框架还利用了记录监测、汇总和分发等关键技术,如Lume和Kafka等框架,使MDSplus大规模日志数据处理能力成为可能。这个系统可以在二级处理数以百万计未经处理的MDSplus日志信息,然后对日志信息进行建模并在网上显示。本报告介绍了实时数据访问日志分析系统基于火花的总体结构的设计和实施。实验结果表明,这个系统被证明具有稳定可靠的性能,对聚变试验数据的管理具有重要的应用价值。这个系统已经设计,并将在下一个运动中采用,系统的细节将在文件中提供。