Current generative super-resolution methods show strong performance on natural images but distort text, creating a fundamental trade-off between image quality and textual readability. To address this, we introduce \textbf{TIGER} (\textbf{T}ext-\textbf{I}mage \textbf{G}uided sup\textbf{E}r-\textbf{R}esolution), a novel two-stage framework that breaks this trade-off through a \textit{"text-first, image-later"} paradigm. \textbf{TIGER} explicitly decouples glyph restoration from image enhancement: it first reconstructs precise text structures and then uses them to guide subsequent full-image super-resolution. This glyph-to-image guidance ensures both high fidelity and visual consistency. To support comprehensive training and evaluation, we also contribute the \textbf{UltraZoom-ST} (UltraZoom-Scene Text), the first scene text dataset with extreme zoom (\textbf{$\times$14.29}). Extensive experiments show that \textbf{TIGER} achieves \textbf{state-of-the-art} performance, enhancing readability while preserving overall image quality.
翻译:当前生成式超分辨率方法在自然图像上表现出强大性能,但会扭曲文本,导致图像质量与文本可读性之间存在根本性权衡。为解决这一问题,我们提出\textbf{TIGER}(\textbf{T}ext-\textbf{I}mage \textbf{G}uided sup\textbf{E}r-\textbf{R}esolution),一种新颖的两阶段框架,通过“先文本,后图像”的范式打破这一权衡。\textbf{TIGER}明确地将字形恢复与图像增强解耦:它首先重建精确的文本结构,然后利用这些结构指导后续的全图像超分辨率。这种从字形到图像的引导机制确保了高保真度和视觉一致性。为支持全面的训练与评估,我们还贡献了\textbf{UltraZoom-ST}(UltraZoom-Scene Text),首个具有极端缩放倍率(\textbf{$\times$14.29})的场景文本数据集。大量实验表明,\textbf{TIGER}实现了\textbf{最先进的}性能,在提升可读性的同时保持了整体图像质量。