论文标题
在微调过程中,Bert嵌入会怎样?
What Happens To BERT Embeddings During Fine-tuning?
论文作者
论文摘要
尽管最近有很多工作研究了在预训练的句子表示中如何编码语言信息,但对于这些模型如何改变以解决下游任务时,这些模型如何变化几乎没有理解。使用一套分析技术(探测分类器,表示性相似性分析和模型消融),我们研究了微调如何影响BERT模型的表示。我们发现,虽然微调一定会做出重大改变,但并不会导致灾难性忘记语言现象。相反,我们发现微调主要影响BERT的顶层,但在任务之间具有值得注意的变化。特别是,依赖性解析重新配置大多数模型,而小队和MNLI似乎涉及很多较浅的处理。最后,我们还发现微调对室外句子的表示影响较弱,这暗示了改善模型概括的空间。
While there has been much recent work studying how linguistic information is encoded in pre-trained sentence representations, comparatively little is understood about how these models change when adapted to solve downstream tasks. Using a suite of analysis techniques (probing classifiers, Representational Similarity Analysis, and model ablations), we investigate how fine-tuning affects the representations of the BERT model. We find that while fine-tuning necessarily makes significant changes, it does not lead to catastrophic forgetting of linguistic phenomena. We instead find that fine-tuning primarily affects the top layers of BERT, but with noteworthy variation across tasks. In particular, dependency parsing reconfigures most of the model, whereas SQuAD and MNLI appear to involve much shallower processing. Finally, we also find that fine-tuning has a weaker effect on representations of out-of-domain sentences, suggesting room for improvement in model generalization.