Today, we have to deal with many data (Big data) and we need to make decisions by choosing an architectural framework to analyze these data coming from different area. Due to this, it become problematic when we want to process these data, and even more, when it is continuous data. When you want to process some data, you have to first receive it, store it, and then query it. This is what we call Batch Processing. It works well when you process big amount of data, but it finds its limits when you want to get fast (or real-time) processing results, such as financial trades, sensors, user session activity, etc. The solution to this problem is stream processing. Stream processing approach consists of data arriving record by record and rather than storing it, the processing should be done directly. Therefore, direct results are needed with a latency that may vary in real-time. In this paper, we propose an assessment quality model to evaluate and choose stream processing frameworks. We describe briefly different architectural frameworks such as Kafka, Spark Streaming and Flink that address the stream processing. Using our quality model, we present a decision tree to support engineers to choose a framework following the quality aspects. Finally, we evaluate our model doing a case study to Twitter and Netflix streaming.
翻译:今天,我们必须处理许多数据(大数据),我们需要通过选择一个建筑框架来分析来自不同领域的这些数据,从而做出决策。 因此,当我们希望处理这些数据时,它就会成为问题, 而当它是连续数据时,它会成为问题。 当你想要处理某些数据时, 你首先必须接收数据, 储存数据, 然后查询它。 这就是我们所谓的批量处理。 当你处理大量数据时, 它会很好地发挥作用, 但是当你想获得快速( 或实时) 处理结果时, 它会发现它的局限性, 比如金融交易、传感器、 用户会话活动等。 解决这个问题的解决方案是流处理。 流处理方法包括通过记录而不是储存记录来获取数据, 处理过程应该直接进行。 因此, 直接的结果需要有一个在实时中变化的隐蔽性。 在本文中, 我们提出一个评估质量模型来评估和选择流处理框架。 我们描述不同的建筑框架, 比如 Kafka, Stark Streaming 和Flink 等, 解决流处理问题。 我们使用我们的质量模型, 我们提出一个决定树 来选择一个“ ” 选择一个“ Twitter ” 样” 模型来选择一个“ 。