We introduce AnyUp, a method for feature upsampling that can be applied to any vision feature at any resolution, without encoder-specific training. Existing learning-based upsamplers for features like DINO or CLIP need to be re-trained for every feature extractor and thus do not generalize to different feature types at inference time. In this work, we propose an inference-time feature-agnostic upsampling architecture to alleviate this limitation and improve upsampling quality. In our experiments, AnyUp sets a new state of the art for upsampled features, generalizes to different feature types, and preserves feature semantics while being efficient and easy to apply to a wide range of downstream tasks.
翻译:本文提出AnyUp,一种适用于任意分辨率下任意视觉特征的上采样方法,无需针对特定编码器进行训练。现有基于学习的特征上采样方法(如针对DINO或CLIP特征)需为每个特征提取器重新训练,因此在推理阶段无法泛化至不同类型的特征。本研究提出一种推理阶段特征无关的上采样架构,以缓解此限制并提升上采样质量。实验表明,AnyUp在特征上采样任务中取得了最先进的性能,可泛化至不同特征类型,在保持特征语义的同时具有高效性,并能便捷应用于广泛的下游任务。