子任务分解使学习顺序的学习到序列任务

论文标题

子任务分解使学习顺序的学习到序列任务

Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks

论文作者

Wies, Noam, Levine, Yoav, Shashua, Amnon

论文摘要

自然语言处理领域通过最近引入巨大的语言模型经历了能力的巨大飞跃。尽管取得了成功，但涉及几个复合步骤的自然语言问题即使是最大的LMS，实际上仍然是无法获得的。这符合实验失败的端到端学习复合问题的学习，这些问题已在各种领域中证明。有效的缓解是为解决复合问题的子任务引入中间监督。最近，几项作品通过采用直接的方法来表明将中间监督纳入复合自然语言问题：序列到序列LM的较高收益，以增强的输入馈送，其中分解任务的标签仅将其简单地与原始输入相吻合。在本文中，我们证明了积极的学习成果，可以激发这些最近的努力。我们表明，当将中间监督连接到输入和训练该修改后的输入上的序列对序列模型时，可能会学习的无可估计的综合问题。我们表明，对于任何一方面，无可汇入的任务家族而言，这都是正确的，另一方面，可以将其分解为多项式的简单子任务数，每个子任务仅取决于O（1）先前的子任务结果。除了激励当代的经验努力以将中间的监督纳入顺序到序列的语言模型之外，我们的积极理论结果是，在结果的景观中，这是同类的第一个结果，即对神经网络学习的中间监督的益处的益处：直到现在，到目前为止，所有对受试者的理论成果是不可能的。监督。

The field of Natural Language Processing has experienced a dramatic leap in capabilities with the recent introduction of huge Language Models. Despite this success, natural language problems that involve several compounded steps are still practically unlearnable, even by the largest LMs. This complies with experimental failures for end-to-end learning of composite problems that were demonstrated in a variety of domains. An effective mitigation is to introduce intermediate supervision for solving sub-tasks of the compounded problem. Recently, several works have demonstrated high gains by taking a straightforward approach for incorporating intermediate supervision in compounded natural language problems: the sequence-to-sequence LM is fed with an augmented input, in which the decomposed tasks' labels are simply concatenated to the original input. In this paper, we prove a positive learning result that motivates these recent efforts. We show that when concatenating intermediate supervision to the input and training a sequence-to-sequence model on this modified input, unlearnable composite problems can become learnable. We show that this is true for any family of tasks which on the one hand, are unlearnable, and on the other hand, can be decomposed into a polynomial number of simple sub-tasks, each of which depends only on O(1) previous sub-task results. Beyond motivating contemporary empirical efforts for incorporating intermediate supervision in sequence-to-sequence language models, our positive theoretical result is the first of its kind in the landscape of results on the benefits of intermediate supervision for neural-network learning: Until now, all theoretical results on the subject are negative, i.e., show cases where learning is impossible without intermediate supervision, while our result is positive, showing that learning is facilitated in the presence of intermediate supervision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题