ORCA：阿拉伯语理解的具有挑战性的基准

论文标题

ORCA：阿拉伯语理解的具有挑战性的基准

ORCA: A Challenging Benchmark for Arabic Language Understanding

论文作者

Elmadany, AbdelRahim, Nagoudi, El Moatez Billah, Abdul-Mageed, Muhammad

论文摘要

由于它们在所有NLP中的关键作用，因此提出了几种基准来评估审慎的语言模型。尽管做出了这些努力，但目前尚无各种自然的公共基准来评估阿拉伯语。这使得衡量阿拉伯语和多语言语言模型的进度具有挑战性。任何针对阿拉伯语的基准都需要考虑到阿拉伯语不是单一语言而是语言和品种集合的事实，这一事实加剧了这一挑战。在这项工作中，我们介绍了Orca，这是一种用于阿拉伯语语言理解评估的公开基准。仔细构建了Orca，以涵盖各种阿拉伯语品种以及在七个NLU任务群中利用60个不同数据集的广泛挑战性的阿拉伯理解任务。为了衡量阿拉伯语NLU的当前进展，我们使用Orca在18种多语言和阿拉伯语模型之间进行了全面比较。我们还为公共排行榜提供了统一的单数评估度量标准（ORCA得分），以促进未来的研究。

Due to their crucial role in all NLP, several benchmarks have been proposed to evaluate pretrained language models. In spite of these efforts, no public benchmark of diverse nature currently exists for evaluation of Arabic. This makes it challenging to measure progress for both Arabic and multilingual language models. This challenge is compounded by the fact that any benchmark targeting Arabic needs to take into account the fact that Arabic is not a single language but rather a collection of languages and varieties. In this work, we introduce ORCA, a publicly available benchmark for Arabic language understanding evaluation. ORCA is carefully constructed to cover diverse Arabic varieties and a wide range of challenging Arabic understanding tasks exploiting 60 different datasets across seven NLU task clusters. To measure current progress in Arabic NLU, we use ORCA to offer a comprehensive comparison between 18 multilingual and Arabic language models. We also provide a public leaderboard with a unified single-number evaluation metric (ORCA score) to facilitate future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题