语言多样性：人类可见，被机器利用

论文标题

语言多样性：人类可见，被机器利用

Language Diversity: Visible to Humans, Exploitable by Machines

论文作者

Bella, Gábor, Byambadorj, Erdenebileg, Chandrashekar, Yamini, Batsuren, Khuyagbaatar, Cheema, Danish Ashgar, Giunchiglia, Fausto

论文摘要

通用知识核心（UKC）是一个大型的多语言词汇数据库，重点是语言多样性，并涵盖了一千种语言。数据库的目的及其工具和数据目录是在人类视觉上可以理解的多样性概念，并被机器正式利用。 UKC网站允许用户探索数百万个单个单词及其含义，还可以探索跨语言融合和差异的现象，例如共享的互联网含义，词典相似性，同源簇或词汇差距。反过来，UKC Livelanguage目录以可计算机处理的形式提供了对基础词汇数据的访问权限，可以在跨语言应用中重复使用。

The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over a thousand languages. The aim of the database, as well as its tools and data catalogue, is to make the somewhat abstract notion of diversity visually understandable for humans and formally exploitable by machines. The UKC website lets users explore millions of individual words and their meanings, but also phenomena of cross-lingual convergence and divergence, such as shared interlingual meanings, lexicon similarities, cognate clusters, or lexical gaps. The UKC LiveLanguage Catalogue, in turn, provides access to the underlying lexical data in a computer-processable form, ready to be reused in cross-lingual applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题