使用迭代文本编辑的数据到文本生成

论文标题

使用迭代文本编辑的数据到文本生成

Data-to-Text Generation with Iterative Text Editing

论文作者

Kasner, Zdeněk, Dušek, Ondřej

论文摘要

我们提出了一种基于迭代文本编辑的数据到文本生成的新方法。我们的方法最大程度地提高了输出文本的完整性和语义准确性，同时利用了最新的预培训模型进行文本编辑（Lasertagger）和语言建模（GPT-2）来提高文本流利度。为此，我们首先使用微不足道的模板将数据项转换为文本，然后我们通过训练句子融合任务的神经模型迭代地改善了结果文本。该模型的输出通过简单的启发式方法过滤，并通过现成的预训练的语言模型重新播放。我们在两个主要的数据到文本数据集（WebNLG，清洁E2E）上评估我们的方法，并分析其警告和好处。此外，我们表明，我们对数据之间生成的制定为使用通用域数据集用于句子融合打开了零击域适应的可能性。

We present a novel approach to data-to-text generation based on iterative text editing. Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency. To this end, we first transform data items to text using trivial templates, and then we iteratively improve the resulting text by a neural model trained for the sentence fusion task. The output of the model is filtered by a simple heuristic and reranked with an off-the-shelf pre-trained language model. We evaluate our approach on two major data-to-text datasets (WebNLG, Cleaned E2E) and analyze its caveats and benefits. Furthermore, we show that our formulation of data-to-text generation opens up the possibility for zero-shot domain adaptation using a general-domain dataset for sentence fusion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题