Mapping of spatial hotspots, i.e., regions with significantly higher rates or probability density of generating certain events (e.g., disease or crime cases), is a important task in diverse societal domains, including public health, public safety, transportation, agriculture, environmental science, etc. Clustering techniques required by these domains differ from traditional clustering methods due to the high economic and social costs of spurious results (e.g., false alarms of crime clusters). As a result, statistical rigor is needed explicitly to control the rate of spurious detections. To address this challenge, techniques for statistically-robust clustering have been extensively studied by the data mining and statistics communities. In this survey we present an up-to-date and detailed review of the models and algorithms developed by this field. We first present a general taxonomy of the clustering process with statistical rigor, covering key steps of data and statistical modeling, region enumeration and maximization, significance testing, and data update. We further discuss different paradigms and methods within each of key steps. Finally, we highlight research gaps and potential future directions, which may serve as a stepping stone in generating new ideas and thoughts in this growing field and beyond.
翻译:空间热点绘图,即产生某些事件(如疾病或犯罪案件)的比率或概率密度大得多的区域,是不同社会领域的重要任务,包括公共卫生、公共安全、运输、农业、环境科学等,这些领域所要求的集群技术不同于传统的集群方法,因为虚假结果(如犯罪群群的虚假警报)的经济和社会成本很高。因此,需要明确进行统计调整,以控制可疑的探测率。为了应对这一挑战,数据挖掘和统计界广泛研究了统计-野蛮集群技术。在这次调查中,我们介绍了对该领域所开发的模式和算法的最新和详细审查。我们首先介绍了与统计组群进程的总体分类,包括数据和统计模型的关键步骤、区域查点和最大化、重要性测试和数据更新。我们进一步讨论了每个关键步骤的不同模式和方法。最后,我们强调了研究差距和潜在未来方向,这可以成为在不断增长的领域之外产生新想法和新想法的跳板。