Spatial clustering detection methods are widely used in many fields including epidemiology, ecology, biology, physics, and sociology. In these fields, areal data is often of interest; such data may result from spatial aggregation (e.g. the number disease cases in a county) or may be inherent attributes of the areal unit as a whole (e.g. the habitat suitability of conserved land parcel). This study aims to assess the performance of two spatial clustering detection methods on areal data: the average nearest neighbor (ANN) ratio and Ripley's K function. These methods are designed for point process data, but their ease of implementation in GIS software (e.g., in ESRI ArcGIS) and the lack of analogous methods for areal data have contributed to their use for areal data. Despite the popularity of applying these methods to areal data, little research has explored their properties in the areal data context. In this paper we conduct a simulation study to evaluate the performance of each method for areal data under various areal structures and types of spatial dependence. These studies find that traditional approach to hypothesis testing using the ANN ratio or Ripley's K function results in inflated empirical type I rates when applied to areal data. We demonstrate that this issue can be remedied for both approaches by using Monte Carlo methods which acknowledge the areal nature of the data to estimate the distribution of the test statistic under the null hypothesis. While such an approach is not currently implemented in ArcGIS, it can be easily done in R using code provided by the authors.
翻译:在许多领域,包括流行病学、生态学、生物学、物理学和社会学领域,广泛使用空间集群探测方法;在这些领域,区域数据往往是引人注意的;这些数据可能来自空间汇总(例如州疾病病例数),也可能是整个区域单位的固有特征(例如养护的地块的生境适宜性);这项研究的目的是评估两种空间集群探测方法在浅度数据方面的性能:近邻平均比率和Ripley的K功能。这些方法是为点进程数据设计的,但它们容易在地理信息系统软件(例如ESRI ArcGIS)中实施,而缺乏类似的局部数据方法,都有助于这些数据用于初步数据。尽管采用这些方法对浅度数据很受欢迎,但几乎没有研究在浅度数据背景下探索了这两种空间集群探测方法的特性。在各种结构下,Rpley的K功能很容易依赖性。这些研究发现,使用ANNU或Ripley的RACGIS软件应用的假设测试方法比较容易执行,但在目前采用这种模拟性数据格式时,我们采用这种模拟性数据法系,而采用这种模拟性数据法则可以证明我们采用这种模拟性数据的数值。