Document shadow removal is essential for enhancing the clarity of digitized documents. Preserving high-frequency details (e.g., text edges and lines) is critical in this process because shadows often obscure or distort fine structures. This paper proposes a matte vision transformer (MatteViT), a novel shadow removal framework that applies spatial and frequency-domain information to eliminate shadows while preserving fine-grained structural details. To effectively retain these details, we employ two preservation strategies. First, our method introduces a lightweight high-frequency amplification module (HFAM) that decomposes and adaptively amplifies high-frequency components. Second, we present a continuous luminance-based shadow matte, generated using a custom-built matte dataset and shadow matte generator, which provides precise spatial guidance from the earliest processing stage. These strategies enable the model to accurately identify fine-grained regions and restore them with high fidelity. Extensive experiments on public benchmarks (RDD and Kligler) demonstrate that MatteViT achieves state-of-the-art performance, providing a robust and practical solution for real-world document shadow removal. Furthermore, the proposed method better preserves text-level details in downstream tasks, such as optical character recognition, improving recognition performance over prior methods.
翻译:文档阴影去除对于提升数字化文档的清晰度至关重要。在此过程中,保留高频细节(如文本边缘与线条)尤为关键,因为阴影常会模糊或扭曲细微结构。本文提出一种遮罩视觉变换器(MatteViT),这是一种新颖的阴影去除框架,利用空间与频域信息消除阴影,同时保持细粒度结构细节。为有效保留这些细节,我们采用两种保护策略:首先,本方法引入轻量级高频增强模块(HFAM),该模块可分解并自适应放大高频分量;其次,我们提出一种基于连续亮度的阴影遮罩,通过定制遮罩数据集与阴影遮罩生成器产生,从处理初始阶段即提供精确的空间引导。这些策略使模型能够准确识别细粒度区域并以高保真度恢复之。在公开基准数据集(RDD与Kligler)上的大量实验表明,MatteViT实现了最先进的性能,为现实场景的文档阴影去除提供了鲁棒且实用的解决方案。此外,所提方法在下游任务(如光学字符识别)中能更好地保留文本级细节,较先前方法显著提升了识别性能。