结构性偏见，用于改善变压器转化为形态上丰富的语言

论文标题

结构性偏见，用于改善变压器转化为形态上丰富的语言

Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages

论文作者

Soulos, Paul, Rao, Sudha, Smith, Caitlin, Rosen, Eric, Celikyilmaz, Asli, McCoy, R. Thomas, Jiang, Yichen, Haley, Coleman, Fernandez, Roland, Palangi, Hamid, Gao, Jianfeng, Smolensky, Paul

论文摘要

基于变压器的模型的出现，机器翻译已经快速发展。这些模型没有内置的明确的语言结构，但是它们仍然可以通过参与相关令牌隐式学习结构化的关系。我们假设通过明确赋予变形金刚具有结构性偏见，可以使这种结构学习变得更加健壮，并且我们研究了在这种偏见中建立的两种方法。一种方法，即TP变形器，可以增强传统的变压器体系结构，包括代表结构的附加组件。第二种方法通过将数据分割为形态令牌化来灌输数据级别的结构。我们测试了这些方法从英语翻译成土耳其语和Inuktitut的形态丰富的语言，并考虑自动指标和人类评估。我们发现，这两种方法中每种方法都允许网络实现更好的性能，但是此改进取决于数据集的大小。总而言之，结构编码方法使变压器更有效率，从而使它们能够从较小量的数据中表现得更好。

Machine translation has seen rapid progress with the advent of Transformer-based models. These models have no explicit linguistic structure built into them, yet they may still implicitly learn structured relationships by attending to relevant tokens. We hypothesize that this structural learning could be made more robust by explicitly endowing Transformers with a structural bias, and we investigate two methods for building in such a bias. One method, the TP-Transformer, augments the traditional Transformer architecture to include an additional component to represent structure. The second method imbues structure at the data level by segmenting the data with morphological tokenization. We test these methods on translating from English into morphologically rich languages, Turkish and Inuktitut, and consider both automatic metrics and human evaluations. We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset. In sum, structural encoding methods make Transformers more sample-efficient, enabling them to perform better from smaller amounts of data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题