论文标题
多任务学习和分子子结构表示的亲脂性预测
Lipophilicity Prediction with Multitask Learning and Molecular Substructures Representation
论文作者
论文摘要
亲脂性是确定细胞膜对药物分子的渗透性的因素之一。因此,准确的亲脂性预测是开发新药的重要步骤。在本文中,我们介绍了一种新颖的方法来通过提取分子子结构来编码其他图形信息。通过将这些子结构的一组广义原子特征添加到通过神经网络(D-MPNN)的已建立的直接消息(D-MPNN)中,我们能够在预测两个主要的亲脂性系数的任务下实现新的最新结果,即logP和logD描述符。我们通过采用多任务方法同时预测LOGP和LOGD值来进一步改善我们的方法。此外,我们介绍了对对称和不对称分子的模型性能的研究,这可能会产生进一步研究的见解。
Lipophilicity is one of the factors determining the permeability of the cell membrane to a drug molecule. Hence, accurate lipophilicity prediction is an essential step in the development of new drugs. In this paper, we introduce a novel approach to encoding additional graph information by extracting molecular substructures. By adding a set of generalized atomic features of these substructures to an established Direct Message Passing Neural Network (D-MPNN) we were able to achieve a new state-of-the-art result at the task of prediction of two main lipophilicity coefficients, namely logP and logD descriptors. We further improve our approach by employing a multitask approach to predict logP and logD values simultaneously. Additionally, we present a study of the model performance on symmetric and asymmetric molecules, that may yield insight for further research.