OCR论文 - 专知

会员服务 ·

OCR

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?

Arxiv

0+阅读 · 12月23日

Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models

Arxiv

0+阅读 · 12月19日

Vision Token Masking Alone Cannot Prevent PHI Leakage in Medical Document OCR: A Systematic Evaluation

Arxiv

0+阅读 · 11月23日

RubricRL: Simple Generalizable Rewards for Text-to-Image Generation

Arxiv

0+阅读 · 11月25日

VTCBench: Can Vision-Language Models Understand Long Context with Vision-Text Compression?

Arxiv

0+阅读 · 12月17日

Cascaded Robust Rectification for Arbitrary Document Images

Arxiv

0+阅读 · 11月28日

Beyond Patch Aggregation: 3-Pass Pyramid Indexing for Vision-Enhanced Document Retrieval

Arxiv

0+阅读 · 11月26日

DKDS: A Benchmark Dataset of Degraded Kuzushiji Documents with Seals for Detection and Binarization

Arxiv

0+阅读 · 12月18日

GLYPH-SR: Can We Achieve Both High-Quality Image Super-Resolution and High-Fidelity Text Recovery via VLM-guided Latent Diffusion Model?

Arxiv

0+阅读 · 10月30日

Multi-Stage Field Extraction of Financial Documents with OCR and Compact Vision-Language Models

Arxiv

0+阅读 · 10月27日

olmOCR 2: Unit Test Rewards for Document OCR

olmOCR 2: Unit Test Rewards for Document OCR

Arxiv

0+阅读 · 10月22日

Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition

Arxiv

0+阅读 · 10月22日

Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR

Arxiv

0+阅读 · 10月20日

ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents

Arxiv

0+阅读 · 10月17日

End-to-End Semantic Preservation in Text-Aware Image Compression Systems

Arxiv

0+阅读 · 10月15日

参考链接

微信扫码咨询专知VIP会员