论文标题
Leapme:基于学习的属性与嵌入
LEAPME: Learning-based Property Matching with Embeddings
论文作者
论文摘要
知识图的创建和扩展等数据集成任务涉及来自许多来源的异质实体的融合。此类实体的匹配和融合也需要匹配和结合其属性(属性)。但是,以前的模式匹配方法主要集中在两个来源上,并且通常依赖于简单的相似性测量。因此,他们在挑战用例中面临问题,例如从许多来源的异质产品实体整合。 因此,我们提出了一种基于机器学习的新的属性匹配方法,称为LeapMe(基于学习的属性与嵌入),该方法利用了属性名称和实例值的众多功能。该方法大量利用单词嵌入来更好地利用属性名称和实例值的特定领域语义。监督机器学习的使用有助于利用单词嵌入的预测能力。 我们针对具有现实世界数据的多个多源数据集的五个基准的比较评估显示了LeapME的高效性。我们还表明,当使用其他域(转移学习)培训数据时,我们的方法甚至是有效的。
Data integration tasks such as the creation and extension of knowledge graphs involve the fusion of heterogeneous entities from many sources. Matching and fusion of such entities require to also match and combine their properties (attributes). However, previous schema matching approaches mostly focus on two sources only and often rely on simple similarity measurements. They thus face problems in challenging use cases such as the integration of heterogeneous product entities from many sources. We therefore present a new machine learning-based property matching approach called LEAPME (LEArning-based Property Matching with Embeddings) that utilizes numerous features of both property names and instance values. The approach heavily makes use of word embeddings to better utilize the domain-specific semantics of both property names and instance values. The use of supervised machine learning helps exploit the predictive power of word embeddings. Our comparative evaluation against five baselines for several multi-source datasets with real-world data shows the high effectiveness of LEAPME. We also show that our approach is even effective when training data from another domain (transfer learning) is used.