论文标题

探索基于分数的分布生成的化学空间

Exploring Chemical Space with Score-based Out-of-distribution Generation

论文作者

Lee, Seul, Jo, Jaehyeong, Hwang, Sung Ju

论文摘要

现有分子生成模型的一个众所周知的局限性是,产生的分子在训练集中高度类似。为了产生真正新颖的分子,这些分子可能具有更好的从头探索的特性,需要在化学空间中更强大的探索。为此,我们提出了分子外分布扩散(MOID),这是一种基于得分的扩散方案,该方案在生成的随机微分方程(SDE)中融合了分布外(OOD)控制,并简单地控制超参数,因此不需要额外的成本。由于某些新型分子可能不符合现实世界药物的基本要求,因此通过利用从属性预测变量的梯度来执行有条件的产生,该梯度将反向时间扩散过程引导到较高得分的区域,以诸如蛋白质 - 素体相互作用,吸毒性和诸如蛋白质 - 素质相互作用,吸毒性和合成性。这使情绪可以搜索新颖而有意义的分子,而不是产生看不见但琐碎的分子。我们在实验上验证了情绪能够探索训练分布以外的化学空间,从而产生了通过现有方法发现的分子,甚至是原始训练池的最高0.01%。我们的代码可从https://github.com/seullee05/mood获得。

A well-known limitation of existing molecular generative models is that the generated molecules highly resemble those in the training set. To generate truly novel molecules that may have even better properties for de novo drug discovery, more powerful exploration in the chemical space is necessary. To this end, we propose Molecular Out-Of-distribution Diffusion(MOOD), a score-based diffusion scheme that incorporates out-of-distribution (OOD) control in the generative stochastic differential equation (SDE) with simple control of a hyperparameter, thus requires no additional costs. Since some novel molecules may not meet the basic requirements of real-world drugs, MOOD performs conditional generation by utilizing the gradients from a property predictor that guides the reverse-time diffusion process to high-scoring regions according to target properties such as protein-ligand interactions, drug-likeness, and synthesizability. This allows MOOD to search for novel and meaningful molecules rather than generating unseen yet trivial ones. We experimentally validate that MOOD is able to explore the chemical space beyond the training distribution, generating molecules that outscore ones found with existing methods, and even the top 0.01% of the original training pool. Our code is available at https://github.com/SeulLee05/MOOD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源