跨语性转移的模型选择

论文标题

跨语性转移的模型选择

Model Selection for Cross-Lingual Transfer

论文作者

Chen, Yang, Ritter, Alan

论文摘要

预先训练的多语言语料库的变压器，例如Mbert和XLM-Roberta，具有令人印象深刻的跨语性转移功能。在零拍传输设置中，仅使用英语培训数据，并且对另一种目标语言进行了微调模型。尽管此功能令人惊讶地很好，但在不同的微调运行之间和零弹奏设置之间的目标语言性能中已经观察到了很大的差异，但在多个微型模型中，没有目标语言开发数据可供选择。先前的工作依靠英语开发数据来选择，以不同的学习率，步骤数量和其他超参数进行微调的模型，通常会导致次优选择。在本文中，我们表明，当以辅助枢轴语言获得少量的带注释数据时，可以选择一致的更好模型。我们提出了一种机器学习方法来模型选择，该方法使用微型模型自己的内部表示形式来预测其跨语性功能。在广泛的实验中，我们发现该方法始终选择比25种语言（包括八种低资源语言）的英语验证数据更好的模型，并且通常可以实现与目标语言开发数据相当的结果。

Transformers that are pre-trained on multilingual corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer capabilities. In the zero-shot transfer setting, only English training data is used, and the fine-tuned model is evaluated on another target language. While this works surprisingly well, substantial variance has been observed in target language performance between different fine-tuning runs, and in the zero-shot setup, no target-language development data is available to select among multiple fine-tuned models. Prior work has relied on English dev data to select among models that are fine-tuned with different learning rates, number of steps and other hyperparameters, often resulting in suboptimal choices. In this paper, we show that it is possible to select consistently better models when small amounts of annotated data are available in auxiliary pivot languages. We propose a machine learning approach to model selection that uses the fine-tuned model's own internal representations to predict its cross-lingual capabilities. In extensive experiments we find that this method consistently selects better models than English validation data across twenty five languages (including eight low-resource languages), and often achieves results that are comparable to model selection using target language development data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题