Technology-assisted review (TAR) is an important industrial application of information retrieval (IR) and machine learning (ML). While a small TAR research community exists, the complexity of TAR software and workflows is a major barrier to entry. Drawing on past open source TAR efforts, as well as design patterns from the IR and ML open source software, we present an open source Python framework for conducting experiments on TAR algorithms. Key characteristics of this framework are declarative representations of workflows and experiment plans, the ability for components to play variable numbers of workflow roles, and state maintenance and restart capabilities. Users can draw on reference implementations of standard TAR algorithms while incorporating novel components to explore their research interests. The framework is available at https://github.com/eugene-yang/tarexp.
翻译:技术辅助审查(TAR)是信息检索和机器学习(ML)的重要工业应用。虽然存在一个小规模的TAR研究群体,但TAR软件和工作流程的复杂性是进入的一个主要障碍。我们利用过去开放源代码TAR的努力以及IR和ML开放源代码软件的设计模式,提出了一个用于进行TAR算法实验的开放源Python框架。这一框架的主要特征是工作流程和实验计划的说明性表述、各组成部分发挥不同工作流程作用数量的能力以及国家维护和重新启动能力。用户可以参考实施标准的TAR算法,同时纳入探索其研究兴趣的新构件。框架可在https://github.com/eugene-yang/tarexp上查阅。