In this report, we present the first place solution to the ECCV 2024 BRAVO Challenge, where a model is trained on Cityscapes and its robustness is evaluated on several out-of-distribution datasets. Our solution leverages the powerful representations learned by vision foundation models, by attaching a simple segmentation decoder to DINOv2 and fine-tuning the entire model. This approach outperforms more complex existing approaches, and achieves first place in the challenge. Our code is publicly available at https://github.com/tue-mps/benchmark-vfm-ss.
翻译:暂无翻译