This paper presents FAMIE, a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction. FAMIE is designed to address a fundamental problem in existing AL frameworks where annotators need to wait for a long time between annotation batches due to the time-consuming nature of model training and data selection at each AL iteration. This hinders the engagement, productivity, and efficiency of annotators. Based on the idea of using a small proxy network for fast data selection, we introduce a novel knowledge distillation mechanism to synchronize the proxy network with the main large model (i.e., BERT-based) to ensure the appropriateness of the selected annotation examples for the main model. Our AL framework can support multiple languages. The experiments demonstrate the advantages of FAMIE in terms of competitive performance and time efficiency for sequence labeling with AL. We publicly release our code (\url{https://github.com/nlp-uoregon/famie}) and demo website (\url{http://nlp.uoregon.edu:9000/}). A demo video for FAMIE is provided at: \url{https://youtu.be/I2i8n_jAyrY}.
翻译:本文介绍了FAMIE,这是一个综合而高效的积极学习(AL)多语种信息提取工具工具。FAMIE旨在解决现有AL框架中的一个根本问题,即由于模型培训和数据选择的耗时性质,每个AL迭代的模型培训和数据选择需要长时间等待批注,因此说明者需要等待很长时间。这妨碍了说明者的参与、生产力和效率。基于使用小型代理网络快速数据选择的构想,我们引入了一个新的知识蒸馏机制,使代理网络与主要大模型(即BERTE-基于)同步,以确保选定的主要模型说明实例的适宜性。我们的AL框架可以支持多种语言。实验表明FAMIE在与AL进行序列标签的竞争性能和时间效率方面的优势。我们公开发布我们的代码(https://github.com/nlp-oregon/famie}和演示网站(http://nlp.oregon.edu:900/http_Agyr_AMI_AMIR_AMI_AMII_AMI_AMI_AMIOVUVA:提供FAMIVV: