论文标题
分子数据中拓扑驱动的生成型生成性完成
Topology-Driven Generative Completion of Lacunae in Molecular Data
论文作者
论文摘要
我们在分子数据集中靶向完成lacunae的方法介绍了一种方法,该方法由拓扑数据分析(例如映射器算法)驱动。使用脚手架限制的生成模型填充空隙,该模型训练有不同的评分功能。该方法可以将链接和顶点添加到数据的骨架表示形式中,例如映射器图,并属于网络完成方法的广泛类别。我们通过在USPTO专利提取的Onium阳离子的数据集中创建空白来说明应用拓扑驱动的数据完成策略的应用。
We introduce an approach to the targeted completion of lacunae in molecular data sets which is driven by topological data analysis, such as Mapper algorithm. Lacunae are filled in using scaffold-constrained generative models trained with different scoring functions. The approach enables addition of links and vertices to the skeletonized representations of the data, such as Mapper graph, and falls in the broad category of network completion methods. We illustrate application of the topology-driven data completion strategy by creating a lacuna in the data set of onium cations extracted from USPTO patents, and repairing it.