MatteViT：基于阴影遮罩引导的高频感知文档阴影去除方法 (MatteViT: High-Frequency-Aware Document Shadow Removal with Shadow Matte Guidance)

Document shadow removal is essential for enhancing the clarity of digitized documents. Preserving high-frequency details (e.g., text edges and lines) is critical in this process because shadows often obscure or distort fine structures. This paper proposes a matte vision transformer (MatteViT), a novel shadow removal framework that applies spatial and frequency-domain information to eliminate shadows while preserving fine-grained structural details. To effectively retain these details, we employ two preservation strategies. First, our method introduces a lightweight high-frequency amplification module (HFAM) that decomposes and adaptively amplifies high-frequency components. Second, we present a continuous luminance-based shadow matte, generated using a custom-built matte dataset and shadow matte generator, which provides precise spatial guidance from the earliest processing stage. These strategies enable the model to accurately identify fine-grained regions and restore them with high fidelity. Extensive experiments on public benchmarks (RDD and Kligler) demonstrate that MatteViT achieves state-of-the-art performance, providing a robust and practical solution for real-world document shadow removal. Furthermore, the proposed method better preserves text-level details in downstream tasks, such as optical character recognition, improving recognition performance over prior methods.

翻译：文档阴影去除对于提升数字化文档的清晰度至关重要。在此过程中，保留高频细节（如文本边缘与线条）尤为关键，因为阴影常会模糊或扭曲细微结构。本文提出一种遮罩视觉变换器（MatteViT），这是一种新颖的阴影去除框架，利用空间与频域信息消除阴影，同时保持细粒度结构细节。为有效保留这些细节，我们采用两种保护策略：首先，本方法引入轻量级高频增强模块（HFAM），该模块可分解并自适应放大高频分量；其次，我们提出一种基于连续亮度的阴影遮罩，通过定制遮罩数据集与阴影遮罩生成器产生，从处理初始阶段即提供精确的空间引导。这些策略使模型能够准确识别细粒度区域并以高保真度恢复之。在公开基准数据集（RDD与Kligler）上的大量实验表明，MatteViT实现了最先进的性能，为现实场景的文档阴影去除提供了鲁棒且实用的解决方案。此外，所提方法在下游任务（如光学字符识别）中能更好地保留文本级细节，较先前方法显著提升了识别性能。