We present two large datasets of labelled plant-images that are suited towards the training of machine learning and computer vision models. The first dataset encompasses as the day of writing over 1.2 million images of indoor-grown crops and weeds common to the Canadian Prairies and many US states. The second dataset consists of over 540,000 images of plants imaged in farmland. All indoor plant images are labelled by species and we provide rich etadata on the level of individual images. This comprehensive database allows to filter the datasets under user-defined specifications such as for example the crop-type or the age of the plant. Furthermore, the indoor dataset contains images of plants taken from a wide variety of angles, including profile shots, top-down shots, and angled perspectives. The images taken from plants in fields are all from a top-down perspective and contain usually multiple plants per image. For these images metadata is also available. In this paper we describe both datasets' characteristics with respect to plant variety, plant age, and number of images. We further introduce an open-access sample of the indoor-dataset that contains 1,000 images of each species covered in our dataset. These, in total 14,000 images, had been selected, such that they form a representative sample with respect to plant age and ndividual plants per species. This sample serves as a quick entry point for new users to the dataset, allowing them to explore the data on a small scale and find the parameters of data most useful for their application without having to deal with hundreds of thousands of individual images.
翻译:我们展示了两个大型的贴有标签的植物图像数据集,这些数据集适合于对机器学习和计算机视觉模型的培训。第一个数据集包含120多万个室内种植作物和杂草的图像,这些图像是加拿大草原和许多美国州所共有的。第二个数据集包含54万多个在农田中图像植物的图像。所有室内植物图像都以物种为标签,我们提供个人图像水平的丰富的元数据。这个综合数据库可以过滤根据用户定义的参数(例如作物类型或植物的年代)建立的数据集。此外,室内数据集包含从多种角度(包括剖析图片拍摄、上下拍摄和角度视角)拍摄的植物图像。第二个数据集包含54万多个在农田中图像。所有室内植物图像都以物种为对象,我们提供了丰富的个人图像数据。我们在这个文件中描述了两个数据集在植物种类、植物年龄和图像数量方面的有用性特征。我们进一步引入了室内数据集的开放样本样本,其中每个物种的图像都有1,000个样本样本,每个物种的样本样本的每个样本,每个样本的样本中都有1 000个条目,这些样本的样本是这些样本,这些样本中的每个物种的样本中的每个样本,这些样本的样本的样本,每个样本的样本的样本是每个样本中的每个样本的条目,每个样本中的每个样本的条目的条目。这些样本的条目的条目。这些样本的条目的条目的条目,这些是每个样本的条目,这些样本的样本的样本的样本的样本的样本,每个样本中都有。