Herbarium sheets present a unique view of the world's botanical history, evolution, and diversity. This makes them an all-important data source for botanical research. With the increased digitisation of herbaria worldwide and the advances in the fine-grained classification domain that can facilitate automatic identification of herbarium specimens, there are a lot of opportunities for supporting research in this field. However, existing datasets are either too small, or not diverse enough, in terms of represented taxa, geographic distribution or host institutions. Furthermore, aggregating multiple datasets is difficult as taxa exist under a multitude of different names and the taxonomy requires alignment to a common reference. We present the Herbarium Half-Earth dataset, the largest and most diverse dataset of herbarium specimens to date for automatic taxon recognition.
翻译:草原表展示了世界植物史、演变和多样性的独特情况,因此它们是植物研究的重要数据来源。随着全世界草原日益数字化,以及精细分类领域的进展,有利于自动鉴定草原标本,有许多机会支持这一领域的研究。然而,现有数据集在代表的分类、地理分布或东道机构方面,要么太小,要么不够多样化。此外,由于在多种不同名称下存在分类,而且分类学需要与共同的参考保持一致,因此很难汇集多个数据集。我们介绍了海草地半地数据集,这是迄今用于自动税务识别的最大和最多样化的草原标本数据集。