一项关于从基于规则模型到大语言模型的开放信息提取的调查

论文标题

一项关于从基于规则模型到大语言模型的开放信息提取的调查

A Survey on Open Information Extraction from Rule-based Model to Large Language Model

论文作者

Liu, Pai, Gao, Wenyang, Dong, Wenjie, Ai, Lin, Gong, Ziwei, Huang, Songfang, Li, Zongsheng, Hoque, Ehsan, Hirschberg, Julia, Zhang, Yue

论文摘要

开放信息提取（OpenIE）代表着一个至关重要的NLP任务，旨在从无限制的文本中得出结构化信息，该信息不受关系类型或域而不受限制。本调查论文概述了2007年至2024年跨越的开放技术，强调了先前的调查中缺少时间学的观点。它检查了Openie任务设置的演变，以与最近技术的进步保持一致。本文将Openie方法分类为基于规则的，神经和预训练的大语言模型，并在时间顺序框架内讨论了每个语言。此外，它突出显示了当前正在使用的普遍数据集和评估指标。本文以这一广泛的审查为基础，概述了潜在的未来方向，以数据集，信息源，输出格式，方法和评估指标。

Open Information Extraction (OpenIE) represents a crucial NLP task aimed at deriving structured information from unstructured text, unrestricted by relation type or domain. This survey paper provides an overview of OpenIE technologies spanning from 2007 to 2024, emphasizing a chronological perspective absent in prior surveys. It examines the evolution of task settings in OpenIE to align with the advances in recent technologies. The paper categorizes OpenIE approaches into rule-based, neural, and pre-trained large language models, discussing each within a chronological framework. Additionally, it highlights prevalent datasets and evaluation metrics currently in use. Building on this extensive review, the paper outlines potential future directions in terms of datasets, information sources, output formats, methodologies, and evaluation metrics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题