分析多语言语言模型的单语和跨语性预读动力学

论文标题

分析多语言语言模型的单语和跨语性预读动力学

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

论文作者

Blevins, Terra, Gonen, Hila, Zettlemoyer, Luke

论文摘要

在多语言审慎的模型中看到的紧急跨语性转移引起了人们对研究其行为的重大兴趣。但是，由于这些分析的重点是全面训练的多语言模型，因此对多语言预审过程的动态知之甚少。我们研究了这些模型通过使用一系列语言任务探索从整个XLM-R预审进的检查点来获得其语言和跨语言能力。我们的分析表明，该模型在早期就达到了较高的语言表现，在更复杂的语言之前获得了低级语言技能。相反，当模型学会在语言对上传递交叉语言时，预处理的观点是。有趣的是，我们还观察到，在许多语言和任务中，最终模型层随着时间的推移表现出明显的性能降解，而语言知识则传播到网络的较低层。综上所述，这些见解突出了多语言预测的复杂性以及随着时间的流逝而导致不同语言的各种行为。

The emergent cross-lingual transfer seen in multilingual pretrained models has sparked significant interest in studying their behavior. However, because these analyses have focused on fully trained multilingual models, little is known about the dynamics of the multilingual pretraining process. We investigate when these models acquire their in-language and cross-lingual abilities by probing checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones. In contrast, the point in pretraining when the model learns to transfer cross-lingually differs across language pairs. Interestingly, we also observe that, across many languages and tasks, the final model layer exhibits significant performance degradation over time, while linguistic knowledge propagates to lower layers of the network. Taken together, these insights highlight the complexity of multilingual pretraining and the resulting varied behavior for different languages over time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题