This paper introduces Augraphy, a Python package geared toward realistic data augmentation strategies for document images. Augraphy uses many different augmentation strategies to produce augmented versions of clean document images that appear as if they have been distorted from standard office operations, such as printing, scanning, and faxing through old or dirty machines, degradation of ink over time, and handwritten markings. Augraphy can be used both as a data augmentation tool for (1) producing diverse training data for tasks such as document de-noising, and (2) generating challenging test data for evaluating model robustness on document image modeling tasks. This paper provides an overview of Augraphy and presents three example robustness testing use-cases of Augraphy.
翻译:本文介绍Aufais(Auphia-Python套件),这是一个面向文件图像的现实数据增强战略的Python套件;Auphia使用许多不同的增强战略,制作更多版本的清洁文件图像,这些图像似乎从标准的办公室业务中扭曲,例如印刷、扫描和通过旧的或脏的机器传真、墨水随时间变质和手写标记;Auphia可同时用作数据增强工具,用于(1)为文件去注等任务提供各种培训数据;(2)生成具有挑战性的测试数据,用于评估文件图像建模任务的模型是否稳健;本文件概述Auphasia,并举三个实例,说明Auphas的稳健性测试案例。