在药物发现中，基于分子对接的虚拟筛查的GPU优化方法：一种比较分析

论文标题

在药物发现中，基于分子对接的虚拟筛查的GPU优化方法：一种比较分析

GPU-optimized Approaches to Molecular Docking-based Virtual Screening in Drug Discovery: A Comparative Analysis

论文作者

Vitali, Emanuele, Ficarelli, Federico, Bisson, Mauro, Gadioli, Davide, Fatica, Massimiliano, Beccari, Andrea R., Palermo, Gianluca

论文摘要

Covid-19表明，对Pandemics具有快速响应的重要性。寻找一种新型药物是一个非常长而复杂的程序，可以使用计算机模拟来加速初步阶段。特别是，虚拟筛选是一个内部阶段，需要将大量可能的候选药物过滤到可管理的数字。本文介绍了针对新型GPU体系结构的虚拟筛选算法的两个GPU优化实现的实现和比较分析。第一个采用了传统方法，该方法传播了在整个GPU中评估单个分子所需的计算。第二种使用批处理方法，该方法利用GPU的平行结构来评估更多的分子并行，而无需考虑处理单个分子的延迟。本文描述了拟议的解决方案的优点和缺点，强调了影响性能的实施细节。实验结果突出了在NVIDIA A100 GPU上运行时两种方法在几个目标分子数据库上的不同性能。这两个实现在要处理的数据方面具有很强的依赖性。在这两种情况下，性能都在改善，同时降低目标分子的维度（原子数和可旋转键的数量）。这两种方法与要筛选的分子数据库的大小相对于不同的行为。虽然潜伏期较早（分子较少）在吞吐量方面达到了性能平稳，但批处理需要较大的分子。但是，初始瞬态之后的性能要高得多（高达5倍的速度）。最后，为了检查这两个实现的效率，我们使用指令屋顶线方法深入分析了其工作量特征。

COVID-19 has shown the importance of having a fast response against pandemics. Finding a novel drug is a very long and complex procedure, and it is possible to accelerate the preliminary phases by using computer simulations. In particular, virtual screening is an in-silico phase that is needed to filter a large set of possible drug candidates to a manageable number. This paper presents the implementations and a comparative analysis of two GPU-optimized implementations of a virtual screening algorithm targeting novel GPU architectures. The first adopts a traditional approach that spreads the computation required to evaluate a single molecule across the entire GPU. The second uses a batched approach that exploits the parallel architecture of the GPU to evaluate more molecules in parallel, without considering the latency to process a single molecule. The paper describes the advantages and disadvantages of the proposed solutions, highlighting implementation details that impact the performance. Experimental results highlight the different performance of the two methods on several target molecule databases while running on NVIDIA A100 GPUs. The two implementations have a strong dependency with respect to the data to be processed. For both cases, the performance is improving while reducing the dimension of the target molecules (number of atoms and rotatable bonds). The two methods demonstrated a different behavior with respect to the size of the molecule database to be screened. While the latency one reaches sooner (with fewer molecules) the performance plateau in terms of throughput, the batched one requires a larger set of molecules. However, the performances after the initial transient period are much higher (up to 5x speed-up). Finally, to check the efficiency of both implementations we deeply analyzed their workload characteristics using the instruction roof-line methodology.

下载PDF全文

下载文献需遵守相关版权规定

论文标题