Recent applications employ publish/subscribe (Pub/Sub) systems so that publishers can easily receive attentions of customers and subscribers can monitor useful information generated by publishers. Due to the prevalence of smart devices and social networking services, a large number of objects that contain both spatial and keyword information have been generated continuously, and the number of subscribers also continues to increase. This poses a challenge to Pub/Sub systems: they need to continuously extract useful information from massive objects for each subscriber in real time. In this paper, we address the problem of k nearest neighbor monitoring on a spatial-keyword data stream for a large number of subscriptions. To scale well to massive objects and subscriptions, we propose a distributed solution, namely DkM-SKS. Given m workers, DkM-SKS divides a set of subscriptions into m disjoint subsets based on a cost model so that each worker has almost the same kNN-update cost, to maintain load balancing. DkM-SKS allows an arbitrary approach to updating kNN of each subscription, so with a suitable in-memory index, DkM-SKS can accelerate update efficiency by pruning irrelevant subscriptions for a given new object. We conduct experiments on real datasets, and the results demonstrate the efficiency and scalability of DkM-SKS.
翻译:最近的应用程序使用出版/订阅(Pub/Sub)系统,使出版商能够容易地得到客户的注意,用户可以监测出版商产生的有用信息。由于智能装置和社会网络服务的普及性,大量包含空间和关键词信息的物体不断生成,用户数量也继续增加。这给普布/Sub系统带来了挑战:它们需要为每个订阅者实时从大件对象中不断提取有用信息。在本文中,我们处理对大量订阅者的空间关键词数据流的近邻监测问题。为了对大型对象和订阅者进行精确的缩放,我们提出了一个分布式解决方案,即DkM-SKS。鉴于工人的米,DkM-S将一套订阅分为基于成本模型的不连接子集,以便每个工人都拥有几乎相同的 kNNN- 更新成本,以保持负重平衡。DkM-S允许任意更新每个订阅单位的KNNN,从而可以使用一个合适的模版对象索引,DkM-S-KS将一个不相干的数据更新效率,我们可以通过不相干的数据测试来加速更新效率。