Technology-assisted review (TAR) workflows based on iterative active learning are widely used in document review applications. Most stopping rules for one-phase TAR workflows lack valid statistical guarantees, which has discouraged their use in some legal contexts. Drawing on the theory of quantile estimation, we provide the first broadly applicable and statistically valid sample-based stopping rules for one-phase TAR. We further show theoretically and empirically that overshooting a recall target, which has been treated as innocuous or desirable in past evaluations of stopping rules, is a major source of excess cost in one-phase TAR workflows. Counterintuitively, incurring a larger sampling cost to reduce excess recall leads to lower total cost in almost all scenarios.
翻译:以迭代积极学习为基础的技术辅助审查工作流程在文件审查应用中广泛使用,对一阶段TAR工作流程的大多数停止规则缺乏有效的统计保障,这在某些法律背景下阻碍了其使用。根据量化估计理论,我们为一阶段TAR提供了第一个广泛适用和统计上有效的抽样停止规则。我们进一步从理论上和经验上表明,过去对停止规则的评价一直认为无端或不可取的超额目标超额是TAR工作流程一阶段超额费用的一个主要来源。反之,如果采用更大的抽样成本减少超额收回,几乎在所有情景中都会降低总成本。