论文标题
语言多样性:人类可见,被机器利用
Language Diversity: Visible to Humans, Exploitable by Machines
论文作者
论文摘要
通用知识核心(UKC)是一个大型的多语言词汇数据库,重点是语言多样性,并涵盖了一千种语言。数据库的目的及其工具和数据目录是在人类视觉上可以理解的多样性概念,并被机器正式利用。 UKC网站允许用户探索数百万个单个单词及其含义,还可以探索跨语言融合和差异的现象,例如共享的互联网含义,词典相似性,同源簇或词汇差距。反过来,UKC Livelanguage目录以可计算机处理的形式提供了对基础词汇数据的访问权限,可以在跨语言应用中重复使用。
The Universal Knowledge Core (UKC) is a large multilingual lexical database with a focus on language diversity and covering over a thousand languages. The aim of the database, as well as its tools and data catalogue, is to make the somewhat abstract notion of diversity visually understandable for humans and formally exploitable by machines. The UKC website lets users explore millions of individual words and their meanings, but also phenomena of cross-lingual convergence and divergence, such as shared interlingual meanings, lexicon similarities, cognate clusters, or lexical gaps. The UKC LiveLanguage Catalogue, in turn, provides access to the underlying lexical data in a computer-processable form, ready to be reused in cross-lingual applications.