Many applications from various disciplines are now required to analyze fast evolving big data in real time. Various approaches for incremental processing of queries have been proposed over the years. Traditional approaches rely on updating the results of a query when updates are streamed rather than re-computing these queries, and therefore, higher execution performance is expected. However, they do not perform well for large databases that are updated at high frequencies. Therefore, new algorithms and approaches have been proposed in the literature to address these challenges by, for instance, reducing the complexity of processing updates. Moreover, many of these algorithms are now leveraging distributed streaming platforms such as Spark Streaming and Flink. In this tutorial, we briefly discuss legacy approaches for incremental query processing, and then give an overview of the new challenges introduced due to processing big data streams. We then discuss in detail the recently proposed algorithms that address some of these challenges. We emphasize the characteristics and algorithmic analysis of various proposed approaches and conclude by discussing future research directions.
翻译:现在需要各学科的许多应用来实时分析快速变化的大数据。多年来已经提出了渐进处理查询的各种办法。传统办法依靠在更新数据流时更新查询结果,而不是对这些查询进行重新计算,因此,预计执行性能会提高。然而,对于在高频率更新的大型数据库来说,这些应用效果不佳。因此,文献中提出了新的算法和办法来应对这些挑战,例如减少处理更新的复杂程度。此外,许多这些算法目前正在利用分布式流出平台,如Spoker Streaming和Flink。在这个教程中,我们简要讨论了用于渐进查询处理的遗留方法,然后概述了因处理大数据流而带来的新挑战。我们随后详细讨论了最近提出的解决其中某些挑战的算法和办法。我们强调各种拟议方法的特点和算法分析,并通过讨论未来的研究方向来结束。