论文标题
I6MA-CNN:一种基于卷积的计算方法,用于鉴定水稻基因组中DNA N6-甲基杜的位点
i6mA-CNN: a convolution based computational approach towards identification of DNA N6-methyladenine sites in rice genome
论文作者
论文摘要
腺嘌呤核苷酸中的DNA N6-甲基化(6MA)是复制后修饰,负责许多生物学功能。基因组宽6MA位点检测的实验方法是一个昂贵且手动的劳动密集型过程。自动化和准确的计算方法可以帮助识别长基因组中的6MA站点节省大量时间和金钱。我们的研究开发了一种基于卷积神经网络的工具I6MA-CNN,能够鉴定水稻基因组中的6MA位点。我们的模型在多种类型的功能之间进行坐标,例如PSEAAC启发的定制特征矢量,多个热表示和二核苷酸物理化学特性。它使用基准数据集上的5倍交叉验证,在0.98的接收器操作特性曲线下达到了0.94的区域。最后,我们在除大米以外的其他两个植物基因组6MA位点标识数据集上评估了我们的模型。结果表明,我们提出的工具能够概括其6MA位点识别对植物基因组的能力,而与植物物种无关。可以在以下网址找到此研究的Web工具:https://cutt.ly/co6kuwg。补充数据(基准数据集,独立测试数据集,比较目的数据集,训练有素的模型,理化属性值,基序发现的注意机制详细信息)可在https://cutt.ly/ppdddedh上获得。
DNA N6-methylation (6mA) in Adenine nucleotide is a post replication modification and is responsible for many biological functions. Experimental methods for genome wide 6mA site detection is an expensive and manual labour intensive process. Automated and accurate computational methods can help to identify 6mA sites in long genomes saving significant time and money. Our study develops a convolutional neural network based tool i6mA-CNN capable of identifying 6mA sites in the rice genome. Our model coordinates among multiple types of features such as PseAAC inspired customized feature vector, multiple one hot representations and dinucleotide physicochemical properties. It achieves area under the receiver operating characteristic curve of 0.98 with an overall accuracy of 0.94 using 5 fold cross validation on benchmark dataset. Finally, we evaluate our model on two other plant genome 6mA site identification datasets besides rice. Results suggest that our proposed tool is able to generalize its ability of 6mA site identification on plant genomes irrespective of plant species. Web tool for this research can be found at: https://cutt.ly/Co6KuWG. Supplementary data (benchmark dataset, independent test dataset, comparison purpose dataset, trained model, physicochemical property values, attention mechanism details for motif finding) are available at https://cutt.ly/PpDdeDH.