Spatial data analytics systems are widely studied in both the academia and industry. However, existing systems are limited when handling a large number of moving objects and real time spatial queries. In this work, we architect a scalable and efficient system CheetahGIS to process streaming spatial queries over massive moving objects. In particular, CheetahGIS is built upon Apache Flink Stateful Functions (StateFun), an API for building distributed streaming applications with an actor-like model. CheetahGIS enjoys excellent scalability due to its modular architecture, which clearly decomposes different components and allows scaling individual components. To improve the efficiency and scalability of CheetahGIS, we devise a suite of optimizations, e.g., lightweight global grid-based index, metadata synchroniza tion strategies, and load balance mechanisms. We also formulate a generic paradigm for spatial query processing in CheetahGIS, and verify its generality by processing three representative streaming queries (i.e., object query, range count query, and k nearest neighbor query). We conduct extensive experiments on both real and synthetic datasets to evaluate CheetahGIS.
翻译:空间数据分析系统在学术界和工业界均得到广泛研究。然而,现有系统在处理大量移动对象和实时空间查询时存在局限。本研究设计了一个可扩展且高效的系统CheetahGIS,用于处理海量移动对象上的流式空间查询。具体而言,CheetahGIS基于Apache Flink Stateful Functions(StateFun)构建,这是一个采用类参与者模型的分布式流式应用编程接口。得益于其模块化架构,CheetahGIS具备卓越的可扩展性——该架构清晰解耦了不同组件,并支持各组件独立扩展。为提升系统效率与可扩展性,我们设计了一系列优化方案,例如轻量级全局网格索引、元数据同步策略及负载均衡机制。同时,我们为CheetahGIS中的空间查询处理构建了通用范式,并通过处理三类代表性流式查询(即对象查询、范围计数查询和k近邻查询)验证了其普适性。我们在真实与合成数据集上进行了大量实验以评估CheetahGIS的性能。