Technology-assisted review (TAR) refers to human-in-the-loop active learning workflows for finding relevant documents in large collections. These workflows often must meet a target for the proportion of relevant documents found (i.e. recall) while also holding down costs. A variety of heuristic stopping rules have been suggested for striking this tradeoff in particular settings, but none have been tested against a range of recall targets and tasks. We propose two new heuristic stopping rules, Quant and QuantCI based on model-based estimation techniques from survey research. We compare them against a range of proposed heuristics and find they are accurate at hitting a range of recall targets while substantially reducing review costs.
翻译:技术辅助审查(TAR)是指在大型收藏中寻找相关文件的 " 流动中人 " 积极学习工作流程,这些工作流程往往必须达到所发现相关文件比例的目标(即回顾),同时要保持低费用。已经建议了在特定环境中消除这种权衡的各种休眠停止规则,但没有一项规则经过一系列回顾目标和任务的测试。我们根据调查研究中基于模型的估计技术提出了两种新的 " 流动中人 " 停止规则,即Quant和QuantCI。我们将它们与一系列拟议的 " 重复 " 比较,发现它们准确达到一系列 " 重复 " 目标,同时大幅度降低审查成本。