论文标题
ET-AL:以熵为主动的积极学习,以减轻材料数据的偏置
ET-AL: Entropy-Targeted Active Learning for Bias Mitigation in Materials Data
论文作者
论文摘要
生长的材料数据和数据驱动的信息学会大大促进材料的发现和设计。尽管数据驱动的模型有很大的进步,但尽管对模型性能的影响很大,但研究资源的质量却较少。在这项工作中,我们关注的是由于现有知识对材料家庭的不平衡覆盖而引起的数据偏见。观察普通材料数据库中晶体系统之间的不同多样性,我们提出了一种基于信息熵的度量,以测量这种偏见。为了减轻偏见,我们开发了一个以熵为目标的主动学习(ET-AL)框架,该框架指导获取新数据以改善代表性不足的晶体系统的多样性。我们证明了ET-AL在缓解偏差方面的能力以及下游机器学习模型的改善。这种方法广泛适用于数据驱动的材料发现,包括自主数据获取和数据集修剪以减少偏见以及其他科学领域中的数据驱动信息学。
Growing materials data and data-driven informatics drastically promote the discovery and design of materials. While there are significant advancements in data-driven models, the quality of data resources is less studied despite its huge impact on model performance. In this work, we focus on data bias arising from uneven coverage of materials families in existing knowledge. Observing different diversities among crystal systems in common materials databases, we propose an information entropy-based metric for measuring this bias. To mitigate the bias, we develop an entropy-targeted active learning (ET-AL) framework, which guides the acquisition of new data to improve the diversity of underrepresented crystal systems. We demonstrate the capability of ET-AL for bias mitigation and the resulting improvement in downstream machine learning models. This approach is broadly applicable to data-driven materials discovery, including autonomous data acquisition and dataset trimming to reduce bias, as well as data-driven informatics in other scientific domains.