克服灾难性的遗忘超越持续学习：神经机器翻译的平衡训练

论文标题

克服灾难性的遗忘超越持续学习：神经机器翻译的平衡训练

Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced Training for Neural Machine Translation

论文作者

Shao, Chenze, Feng, Yang

论文摘要

当从动态数据分布中依次学习多个任务时，神经网络倾向于逐渐忘记以前学习的知识。这个问题称为\ textit {灾难性遗忘}，这是神经网络不断学习的基本挑战。在这项工作中，我们观察到灾难性忘记不仅发生在持续学习中，而且会影响传统的静态训练。神经网络，尤其是神经机器翻译模型，即使他们从静态训练集中学习，也会遭受灾难性遗忘。具体而言，最终模型对训练样本的关注不平衡，最近暴露的样品吸引了比以前的样本更多的关注。根本的原因是，培训样本在每个模型更新中都无法获得平衡的培训，因此我们将此问题命名为\ textit {不平衡培训}。为了减轻这个问题，我们提出了互补的在线知识蒸馏（COKD），该知识蒸馏（COKD）使用了动态更新的教师模型，该模型训练了针对学生模型的特定数据订单培训的互补知识。多个机器翻译任务的实验结果表明，我们的方法成功地减轻了训练不平衡的问题，并对强大的基线系统实现了实质性改进。

Neural networks tend to gradually forget the previously learned knowledge when learning multiple tasks sequentially from dynamic data distributions. This problem is called \textit{catastrophic forgetting}, which is a fundamental challenge in the continual learning of neural networks. In this work, we observe that catastrophic forgetting not only occurs in continual learning but also affects the traditional static training. Neural networks, especially neural machine translation models, suffer from catastrophic forgetting even if they learn from a static training set. To be specific, the final model pays imbalanced attention to training samples, where recently exposed samples attract more attention than earlier samples. The underlying cause is that training samples do not get balanced training in each model update, so we name this problem \textit{imbalanced training}. To alleviate this problem, we propose Complementary Online Knowledge Distillation (COKD), which uses dynamically updated teacher models trained on specific data orders to iteratively provide complementary knowledge to the student model. Experimental results on multiple machine translation tasks show that our method successfully alleviates the problem of imbalanced training and achieves substantial improvements over strong baseline systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题