论文标题
对抗性转移学习,用于标点符号修复
Adversarial Transfer Learning for Punctuation Restoration
论文作者
论文摘要
先前的研究表明,单词嵌入和词性(POS)标签有助于标点符号恢复任务。但是,仍然存在两个缺点。一个是单词嵌入是通过单向语言建模目标预先训练的。因此,嵌入一词仅包含从左到右的上下文信息。另一个是POS标签由外部POS标签器提供。因此,计算成本将增加,预测标签不正确可能会影响解码过程中恢复标点符号的性能。本文提出了对抗性转移学习以解决这些问题。来自变压器(BERT)模型的预训练的双向编码器表示,用于初始化标点模型。因此,转移的模型参数同时携带从左到右和左侧表示。此外,引入了对抗性多任务学习,以学习标点符号预测的任务不变知识。我们使用额外的POS标记任务来帮助训练标点符号预测任务。对抗性培训用于防止共享参数包含特定任务信息。我们仅使用标点符号预测任务来恢复解码阶段的标记。因此,它将不需要额外的计算,也不需要从POS标记器引入错误的标签。实验是在IWSLT2011数据集上进行的。结果表明,预测模型的标点符号通过从POS标记任务中获得任务不变知识获得进一步的绩效改进。我们的最佳模型优于以前的最先进模型,该模型仅在测试集上使用词汇特征训练,最多可达9.2%的绝对总体F_1得分。
Previous studies demonstrate that word embeddings and part-of-speech (POS) tags are helpful for punctuation restoration tasks. However, two drawbacks still exist. One is that word embeddings are pre-trained by unidirectional language modeling objectives. Thus the word embeddings only contain left-to-right context information. The other is that POS tags are provided by an external POS tagger. So computation cost will be increased and incorrect predicted tags may affect the performance of restoring punctuation marks during decoding. This paper proposes adversarial transfer learning to address these problems. A pre-trained bidirectional encoder representations from transformers (BERT) model is used to initialize a punctuation model. Thus the transferred model parameters carry both left-to-right and right-to-left representations. Furthermore, adversarial multi-task learning is introduced to learn task invariant knowledge for punctuation prediction. We use an extra POS tagging task to help the training of the punctuation predicting task. Adversarial training is utilized to prevent the shared parameters from containing task specific information. We only use the punctuation predicting task to restore marks during decoding stage. Therefore, it will not need extra computation and not introduce incorrect tags from the POS tagger. Experiments are conducted on IWSLT2011 datasets. The results demonstrate that the punctuation predicting models obtain further performance improvement with task invariant knowledge from the POS tagging task. Our best model outperforms the previous state-of-the-art model trained only with lexical features by up to 9.2% absolute overall F_1-score on test set.
