The ability of a document classifier to handle inputs that are drawn from a distribution different from the training distribution is crucial for robust deployment and generalizability. The RVL-CDIP corpus is the de facto standard benchmark for document classification, yet to our knowledge all studies that use this corpus do not include evaluation on out-of-distribution documents. In this paper, we curate and release a new out-of-distribution benchmark for evaluating out-of-distribution performance for document classifiers. Our new out-of-distribution benchmark consists of two types of documents: those that are not part of any of the 16 in-domain RVL-CDIP categories (RVL-CDIP-O), and those that are one of the 16 in-domain categories yet are drawn from a distribution different from that of the original RVL-CDIP dataset (RVL-CDIP-N). While prior work on document classification for in-domain RVL-CDIP documents reports high accuracy scores, we find that these models exhibit accuracy drops of between roughly 15-30% on our new out-of-domain RVL-CDIP-N benchmark, and further struggle to distinguish between in-domain RVL-CDIP-N and out-of-domain RVL-CDIP-O inputs. Our new benchmark provides researchers with a valuable new resource for analyzing out-of-distribution performance on document classifiers. Our new out-of-distribution data can be found at https://tinyurl.com/4he6my23.
翻译:文件分类器处理来自与培训分发不同的分发版本的投入的能力对于稳健部署和可概括性至关重要。 RVL-CDIP 文稿是文件分类事实上的标准基准,但据我们所知,使用此文稿的所有研究并不包括对分发外文件的评价。在本文件中,我们为文件分类器翻译和发布一个新的分配外基准,用于评价文件分类器的分发外业绩。我们新的分发外基准由两类文件组成:那些不属于16个内部RVL-CDIP类别(RVL-CDIP-O)的任何一部分的文件,而16个内部类别中那些尚未从原始RVL-CDIP 数据集(RVL-CD-IP-N-N)的分发中抽取出来的研究类别之一的文件。虽然我们先前关于内部RVL-CD CD CD CD CD 文件分类工作报告高准确性分数,但我们发现这些模型显示,在新的RV-DR-CD-CD-IC-IP 文摘取的 RV-S-Servial-I 文件中,在新的业绩基准中,在我们的RV-CD-CD-CD-I-I-I-Smainal-deal-I-deal-I-I-deal-I-I-IL 上,在新的业绩基准和斗争中,在新的数据中,可以提供新的-I-ITR-ITR-I-I-I-I-I-L-S-S-S-S-S-S-S-S-ID-ID-ID-ID-T-S-S-ITRD-ID-ID-T-T-T-ID-T-T-T-T-T-T-T-T-S-T-T-T-T-T-T-T-S-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T