FAMIE：一个快速积极的学习框架，用于多语言信息提取

论文标题

FAMIE：一个快速积极的学习框架，用于多语言信息提取

FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction

论文作者

Van Nguyen, Minh, Ngo, Nghia Trung, Min, Bonan, Nguyen, Thien Huu

论文摘要

本文介绍了Famie，这是一种用于多语言信息提取的全面有效的积极学习（AL）工具包。 Famie旨在解决现有的AL框架中的一个基本问题，在该框架中，由于模型培训的耗时性质和每个迭代时的数据选择，因此注释者需要在注释批处理之间等待很长一段时间。这阻碍了注释者的参与，生产力和效率。基于使用小型代理网络进行快速数据选择的想法，我们引入了一种新型的知识蒸馏机制，以将代理网络与主要大型模型（即基于BERT）同步，以确保主要模型所选注释示例的适当性。我们的AL框架可以支持多种语言。该实验证明了Famie在竞争性能和使用AL的序列标记方面的优势。我们公开发布代码（\ url {https://github.com/nlp-uoregon/famie}）和demo网站（\ url {http://nlp.uoregon.edu：9000/}）。提供了一个Famie的演示视频，网址为：\ url {https://youtu.be/i2i8n_jayry}。

This paper presents FAMIE, a comprehensive and efficient active learning (AL) toolkit for multilingual information extraction. FAMIE is designed to address a fundamental problem in existing AL frameworks where annotators need to wait for a long time between annotation batches due to the time-consuming nature of model training and data selection at each AL iteration. This hinders the engagement, productivity, and efficiency of annotators. Based on the idea of using a small proxy network for fast data selection, we introduce a novel knowledge distillation mechanism to synchronize the proxy network with the main large model (i.e., BERT-based) to ensure the appropriateness of the selected annotation examples for the main model. Our AL framework can support multiple languages. The experiments demonstrate the advantages of FAMIE in terms of competitive performance and time efficiency for sequence labeling with AL. We publicly release our code (\url{https://github.com/nlp-uoregon/famie}) and demo website (\url{http://nlp.uoregon.edu:9000/}). A demo video for FAMIE is provided at: \url{https://youtu.be/I2i8n_jAyrY}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题