论文标题

使用光谱聚类提取特征提取基因功能预测的特征簇,使用层次多标签分类提取

Feature extraction using Spectral Clustering for Gene Function Prediction using Hierarchical Multi-label Classification

论文作者

Romero, Miguel, Ramírez, Oscar, Finke, Jorge, Rocha, Camilo

论文摘要

基因注释解决了预测特定生物体基因与功能(例如生物学过程)之间未知关联的问题。尽管有最近的进步,但在很大程度上依赖体内生物学实验的注释程序所需的成本和时间仍然很高。本文提出了一种针对注释问题的硅方法的新颖方法,该方法结合了聚类分析和分层多标签分类(HMC)。该方法使用光谱聚类来从基因共表达网络(GCN)中提取新特征并丰富预测任务。 HMC用于构建多个考虑基因函数层次结构的估计器。提出的方法应用于Zea Mays的案例研究,Zea Mays是世界上最主要和生产性作物之一。结果说明了计算机方法中如何减少基因注释的时间和成本的关键。更具体地说,它们强调了:(i)建立代表GCN中基因关系结构对注释基因的结构的新特征; (ii)考虑生物过程的结构以获得一致的预测。

Gene annotation addresses the problem of predicting unknown associations between gene and functions (e.g., biological processes) of a specific organism. Despite recent advances, the cost and time demanded by annotation procedures that rely largely on in vivo biological experiments remain prohibitively high. This paper presents a novel in silico approach for to the annotation problem that combines cluster analysis and hierarchical multi-label classification (HMC). The approach uses spectral clustering to extract new features from the gene co-expression network (GCN) and enrich the prediction task. HMC is used to build multiple estimators that consider the hierarchical structure of gene functions. The proposed approach is applied to a case study on Zea mays, one of the most dominant and productive crops in the world. The results illustrate how in silico approaches are key to reduce the time and costs of gene annotation. More specifically, they highlight the importance of: (i) building new features that represent the structure of gene relationships in GCNs to annotate genes; and (ii) taking into account the structure of biological processes to obtain consistent predictions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源