具有合理性和显着建模的压缩摘要

论文标题

具有合理性和显着建模的压缩摘要

Compressive Summarization with Plausibility and Salience Modeling

论文作者

Desai, Shrey, Xu, Jiacheng, Durrett, Greg

论文摘要

压缩摘要系统通常依靠一组精心设计的句法规则来确定可以删除哪些摘要句子的跨度，然后通过优化内容选择（Rouge）来了解实际删除的模型。在这项工作中，我们建议放宽候选跨度的僵化句法约束，而是将压缩决策留给两个数据驱动的标准：合理性和显着性。删除跨度是合理的，如果将其删除保持语法和句子的真实性，并且如果跨度包含摘要中的重要信息，则跨度是显着的。这些中的每一个都由预先训练的变压器模型来判断，并且只能应用既合理又不显着的缺失。当整合到简单的提取压缩管道中时，我们的方法在基准摘要数据集上实现了强大的内域结果，并且人类评估表明，合理性模型通常会选择语法和事实删除。此外，我们的方法的灵活性使其可以概括跨域：我们的系统仅对来自新域中的500个样本进行了微调，可以匹配或超过对有关更多数据训练的内域提取模型。

Compressive summarization systems typically rely on a crafted set of syntactic rules to determine what spans of possible summary sentences can be deleted, then learn a model of what to actually delete by optimizing for content selection (ROUGE). In this work, we propose to relax the rigid syntactic constraints on candidate spans and instead leave compression decisions to two data-driven criteria: plausibility and salience. Deleting a span is plausible if removing it maintains the grammaticality and factuality of a sentence, and spans are salient if they contain important information from the summary. Each of these is judged by a pre-trained Transformer model, and only deletions that are both plausible and not salient can be applied. When integrated into a simple extraction-compression pipeline, our method achieves strong in-domain results on benchmark summarization datasets, and human evaluation shows that the plausibility model generally selects for grammatical and factual deletions. Furthermore, the flexibility of our approach allows it to generalize cross-domain: our system fine-tuned on only 500 samples from a new domain can match or exceed an in-domain extractive model trained on much more data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题