论文标题

BIOS:算法生成的生物医学知识图

BIOS: An Algorithmically Generated Biomedical Knowledge Graph

论文作者

Yu, Sheng, Yuan, Zheng, Xia, Jun, Luo, Shengxuan, Ying, Huaiyuan, Zeng, Sihang, Ren, Jingyi, Yuan, Hongyi, Zhao, Zhengyun, Lin, Yucong, Lu, Keming, Wang, Jing, Xie, Yutao, Shum, Heung-Yeung

论文摘要

生物医学知识图(BiomedKGS)是生物医学和医疗保健大数据和人工智能(AI)的必不可少的基础架构(AI),促进了自然语言处理,模型开发和数据交换。几十年来,这些知识图是通过专家策划开发的。但是,这种方法不再能跟上当今的AI开发,并且必须过渡到算法生成的生物蛋白酶。在这项工作中,我们介绍了生物医学信息学本体论系统(BIOS),这是机器学习算法完全由机器学习算法生成的第一个大规模公开可用的生物膜。 BIOS目前包含410万个概念,有740万个语言和730万个关系三胞胎。我们介绍了开发BIOS的方法,包括原始生物医学术语的策划,同义词的计算识别以及这些术语的聚合以创建概念节点,概念的语义类型分类,关系识别和生物医学机器翻译。我们提供有关当前BIOS含量的统计信息,并对期限质量,同义词分组和关系提取进行初步评估。结果表明,基于机器学习的BiomedKG开发是传统专家策划的可行替代方法。

Biomedical knowledge graphs (BioMedKGs) are essential infrastructures for biomedical and healthcare big data and artificial intelligence (AI), facilitating natural language processing, model development, and data exchange. For decades, these knowledge graphs have been developed via expert curation; however, this method can no longer keep up with today's AI development, and a transition to algorithmically generated BioMedKGs is necessary. In this work, we introduce the Biomedical Informatics Ontology System (BIOS), the first large-scale publicly available BioMedKG generated completely by machine learning algorithms. BIOS currently contains 4.1 million concepts, 7.4 million terms in two languages, and 7.3 million relation triplets. We present the methodology for developing BIOS, including the curation of raw biomedical terms, computational identification of synonymous terms and aggregation of these terms to create concept nodes, semantic type classification of the concepts, relation identification, and biomedical machine translation. We provide statistics on the current BIOS content and perform preliminary assessments of term quality, synonym grouping, and relation extraction. The results suggest that machine learning-based BioMedKG development is a viable alternative to traditional expert curation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源