We present a zero-shot segmentation approach for agricultural imagery that leverages Plantnet, a large-scale plant classification model, in conjunction with its DinoV2 backbone and the Segment Anything Model (SAM). Rather than collecting and annotating new datasets, our method exploits Plantnet's specialized plant representations to identify plant regions and produce coarse segmentation masks. These masks are then refined by SAM to yield detailed segmentations. We evaluate on four publicly available datasets of various complexity in terms of contrast including some where the limited size of the training data and complex field conditions often hinder purely supervised methods. Our results show consistent performance gains when using Plantnet-fine-tuned DinoV2 over the base DinoV2 model, as measured by the Jaccard Index (IoU). These findings highlight the potential of combining foundation models with specialized plant-centric models to alleviate the annotation bottleneck and enable effective segmentation in diverse agricultural scenarios.
翻译:本文提出一种面向农业图像的零样本分割方法,该方法结合了大规模植物分类模型Plantnet、其DinoV2骨干网络以及Segment Anything Model(SAM)。我们的方法无需收集和标注新数据集,而是利用Plantnet的专用植物表征来识别植物区域并生成粗粒度分割掩码,再通过SAM对这些掩码进行精细化处理以获得细节完整的分割结果。我们在四个公开可用的数据集上进行了评估,这些数据集在对比度方面具有不同的复杂度,其中部分数据集因训练数据规模有限和田间条件复杂,常使纯监督方法难以取得理想效果。实验结果表明,基于Plantnet微调的DinoV2模型相较于基础DinoV2模型,在Jaccard指数(IoU)指标上均取得了稳定的性能提升。这些发现凸显了将基础模型与专用植物中心模型相结合,在缓解标注瓶颈、实现多样化农业场景中高效分割方面的潜力。