Accurate building instance segmentation and height classification are critical for urban planning, 3D city modeling, and infrastructure monitoring. This paper presents a detailed analysis of YOLOv11, the recent advancement in the YOLO series of deep learning models, focusing on its application to joint building extraction and discrete height classification from satellite imagery. YOLOv11 builds on the strengths of earlier YOLO models by introducing a more efficient architecture that better combines features at different scales, improves object localization accuracy, and enhances performance in complex urban scenes. Using the DFC2023 Track 2 dataset -- which includes over 125,000 annotated buildings across 12 cities -- we evaluate YOLOv11's performance using metrics such as precision, recall, F1 score, and mean average precision (mAP). Our findings demonstrate that YOLOv11 achieves strong instance segmentation performance with 60.4\% mAP@50 and 38.3\% mAP@50--95 while maintaining robust classification accuracy across five predefined height tiers. The model excels in handling occlusions, complex building shapes, and class imbalance, particularly for rare high-rise structures. Comparative analysis confirms that YOLOv11 outperforms earlier multitask frameworks in both detection accuracy and inference speed, making it well-suited for real-time, large-scale urban mapping. This research highlights YOLOv11's potential to advance semantic urban reconstruction through streamlined categorical height modeling, offering actionable insights for future developments in remote sensing and geospatial intelligence.
翻译:精确的建筑实例分割与高度分类对于城市规划、三维城市建模和基础设施监测至关重要。本文对YOLO系列深度学习模型的最新进展YOLOv11进行了详细分析,重点探讨其在卫星影像中联合建筑提取与离散高度分类的应用。YOLOv11继承了早期YOLO模型的优势,通过引入更高效的架构,更好地融合了多尺度特征,提升了目标定位精度,并增强了在复杂城市场景中的性能表现。利用DFC2023 Track 2数据集(涵盖12个城市超过125,000个标注建筑),我们采用精确率、召回率、F1分数和平均精度均值(mAP)等指标评估YOLOv11的性能。实验结果表明,YOLOv11在实例分割任务中取得了60.4%的mAP@50和38.3%的mAP@50-95的优异性能,同时在五个预定义高度层级中保持了稳健的分类准确率。该模型在处理遮挡、复杂建筑形态和类别不平衡(尤其是罕见高层建筑)方面表现突出。对比分析证实,YOLOv11在检测精度和推理速度上均优于早期的多任务框架,使其特别适用于实时大规模城市测绘。本研究揭示了YOLOv11通过简化的分类高度建模推动语义化城市重建的潜力,为遥感与地理空间智能领域的未来发展提供了可操作的见解。