With the rise in high resolution remote sensing technologies there has been an explosion in the amount of data available for forest monitoring, and an accompanying growth in artificial intelligence applications to automatically derive forest properties of interest from these datasets. Many studies use their own data at small spatio-temporal scales, and demonstrate an application of an existing or adapted data science method for a particular task. This approach often involves intensive and time-consuming data collection and processing, but generates results restricted to specific ecosystems and sensor types. There is a lack of widespread acknowledgement of how the types and structures of data used affects performance and accuracy of analysis algorithms. To accelerate progress in the field more efficiently, benchmarking datasets upon which methods can be tested and compared are sorely needed. Here, we discuss how lack of standardisation impacts confidence in estimation of key forest properties, and how considerations of data collection need to be accounted for in assessing method performance. We present pragmatic requirements and considerations for the creation of rigorous, useful benchmarking datasets for forest monitoring applications, and discuss how tools from modern data science can improve use of existing data. We list a set of example large-scale datasets that could contribute to benchmarking, and present a vision for how community-driven, representative benchmarking initiatives could benefit the field.
翻译:随着高分辨率遥感技术的上升,可用于森林监测的数据数量急剧增加,人工智能应用也随之增加,自动从这些数据集中获取森林方面感兴趣的特性。许多研究在小时空尺度上使用自己的数据,并展示现有或经调整的数据科学方法用于某项特定任务的应用。这种方法往往涉及密集和耗时的数据收集和处理,但所产生的结果只限于特定的生态系统和传感器类型。对所使用的数据类型和结构如何影响分析算法的性能和准确性缺乏广泛的认识。为了更有效率地加快实地的进展,非常需要对可测试和比较方法的数据集进行基准化。在这里,我们讨论了缺乏标准化如何影响对关键森林特性估计的信心,以及在评估方法性能时需要如何考虑数据收集问题。我们提出了为森林监测应用建立严格、有用的基准数据集的务实要求和考虑,并讨论了现代数据科学工具如何改进现有数据的利用。我们列举了一套大型数据集,这些数据集可如何推动实地基准化、具有代表性的举措和愿景。