论文标题
通过复发神经网络预测序列定义的大分子的骨料形态
Predicting aggregate morphology of sequence-defined macromolecules with Recurrent Neural Networks
论文作者
论文摘要
稀释序列定义的大分子的自组装是一种复杂的现象,其中化学部分的局部排列可以导致形成远距离结构。该结构对序列的依赖性必然意味着两者之间的映射存在,但是到目前为止很难建模。由于缺乏有效的订单参数,庞大的设计空间,固有的可变性以及与当前可用的仿真技术相关的高计算成本,预测这些大分子的聚合行为是具有挑战性的。在这里,我们准确地预测了使用监督的机器学习从序列定义的大分子中自组装的聚集体的形态。我们发现,具有隐式表示学习的回归模型的性能要比基于工程功能(例如$ k $ mer计数)的回归模型要好得多,而基于$ k $ mer的回归模型和基于重复的神经网络回归器的回归剂在我们测试过的九种模型体系结构中的表现最好。此外,我们使用回归模型证明了单体序列的高通量筛选,以鉴定自组装候选者中所选形态的候选者。我们的策略被证明可以在我们执行的每个测试中成功识别多个合适的序列,因此我们希望在此获得的见解可以扩展到将来其他日益复杂的设计场景,例如在多分散性和不同环境条件下的序列设计。
Self-assembly of dilute sequence-defined macromolecules is a complex phenomenon in which the local arrangement of chemical moieties can lead to the formation of long-range structure. The dependence of this structure on the sequence necessarily implies that a mapping between the two exists, yet it has been difficult to model so far. Predicting the aggregation behavior of these macromolecules is challenging due to the lack of effective order parameters, a vast design space, inherent variability, and high computational costs associated with currently available simulation techniques. Here, we accurately predict the morphology of aggregates self-assembled from sequence-defined macromolecules using supervised machine learning. We find that regression models with implicit representation learning perform significantly better than those based on engineered features such as $k$-mer counting, and a Recurrent-Neural-Network-based regressor performs the best out of nine model architectures we tested. Furthermore, we demonstrate the high-throughput screening of monomer sequences using the regression model to identify candidates for self-assembly into selected morphologies. Our strategy is shown to successfully identify multiple suitable sequences in every test we performed, so we hope the insights gained here can be extended to other increasingly complex design scenarios in the future, such as the design of sequences under polydispersity and at varying environmental conditions.