论文标题
提前计划:段落完成任务的自学文本计划
Plan ahead: Self-Supervised Text Planning for Paragraph Completion Task
论文作者
论文摘要
尽管上下文化的语言模型在各种NLP任务上取得了成功,但语言模型本身无法捕获长长的多句子文档的文本连贯性(例如,段落)。人类经常在发出言论之前就何种方式以及如何发言做出结构性决定。通过这种高级决策和以连贯的方式构造文本的指导表面实现本质上称为计划过程。该模型在哪里可以学习这种高级连贯性?段落本身包含在这项工作中称为自学的各种归纳相干信号,例如句子顺序,局部关键字,修辞结构等。由此激励,这项工作提出了一个新的段落完成任务parcom;预测段落中的蒙版句子。但是,该任务遭到预测和选择相对于给定上下文的适当局部内容。为了解决这个问题,我们提出了一个自我监督的文本计划者SSPLANNER,该计划者先预测首先说些什么(内容预测),然后使用预测的内容来指导审慎的语言模型(表面实现)。 Ssplanner在自动和人类评估中的段落完成任务上的基线生成模型优于基线生成模型。我们还发现,名词和动词类型的关键字的组合是最有效的内容选择。随着更多内容关键字的数量,总体发电质量也会增加。
Despite the recent success of contextualized language models on various NLP tasks, language model itself cannot capture textual coherence of a long, multi-sentence document (e.g., a paragraph). Humans often make structural decisions on what and how to say about before making utterances. Guiding surface realization with such high-level decisions and structuring text in a coherent way is essentially called a planning process. Where can the model learn such high-level coherence? A paragraph itself contains various forms of inductive coherence signals called self-supervision in this work, such as sentence orders, topical keywords, rhetorical structures, and so on. Motivated by that, this work proposes a new paragraph completion task PARCOM; predicting masked sentences in a paragraph. However, the task suffers from predicting and selecting appropriate topical content with respect to the given context. To address that, we propose a self-supervised text planner SSPlanner that predicts what to say first (content prediction), then guides the pretrained language model (surface realization) using the predicted content. SSPlanner outperforms the baseline generation models on the paragraph completion task in both automatic and human evaluation. We also find that a combination of noun and verb types of keywords is the most effective for content selection. As more number of content keywords are provided, overall generation quality also increases.