学习为视觉任务扩展多语言表示

论文标题

学习为视觉任务扩展多语言表示

Learning to Scale Multilingual Representations for Vision-Language Tasks

论文作者

Burns, Andrea, Kim, Donghyun, Wijaya, Derry, Saenko, Kate, Plummer, Bryan A.

论文摘要

当前的多语言视觉语言模型要么需要每个受支持的语言的其他参数，要么随着语言的添加而遭受性能退化。在本文中，我们提出了一个可扩展的多语言对齐语言表示（SMALR），该语言代表支持许多语言，而没有牺牲下游任务性能，而没有模型参数。 Smalr在多语言词汇中学习了大多数单词的固定尺寸的语言无关表示，仅将语言特定的功能保留为几个。我们使用蒙版的跨语言建模损失来使特征与其他语言的上下文相结合。此外，我们提出了一个跨语言一致性模块，该模块可确保对查询进行预测，其机器翻译是可比的。 Smalr的有效性已通过十种不同的语言证明，这是迄今为止视觉任务中支持的数字的两倍。与其他单词嵌入方法相比，我们对多语言图像句子检索和跑赢大于1/5的训练参数评估了3-4％。

Current multilingual vision-language models either require a large number of additional parameters for each supported language, or suffer performance degradation as languages are added. In this paper, we propose a Scalable Multilingual Aligned Language Representation (SMALR) that supports many languages with few model parameters without sacrificing downstream task performance. SMALR learns a fixed size language-agnostic representation for most words in a multilingual vocabulary, keeping language-specific features for just a few. We use a masked cross-language modeling loss to align features with context from other languages. Additionally, we propose a cross-lingual consistency module that ensures predictions made for a query and its machine translation are comparable. The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date. We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4% with less than 1/5th the training parameters compared to other word embedding methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题