论文标题

将视觉问题回答的最新进展应用于记录链接

Applying recent advances in Visual Question Answering to Record Linkage

论文作者

Smilevski, Marko

论文摘要

多模式记录链接是匹配来自代表同一实体的多个来源的多模式记录的过程。该领域尚未在研究中探索,我们提出了两种基于深度学习体系结构的解决方案,这些解决方案是受视觉问题回答最近工作的启发的。我们建议的神经网络使用两个不同的融合模块:复发性神经网络 +卷积神经网络融合模块和堆叠的注意力网络融合模块,它们共同结合了记录的视觉和文本数据。这些融合模型的输出是计算记录相似性的暹罗神经网络的输入。使用Avito重复广告检测数据集中的数据,我们训练这些解决方案,并从实验中得出结论,复发性神经网络 +卷积神经网络融合模块优于使用手工制作的功能的简单模型。我们还发现,如果它们的平均描述大于40个单词,则复发性神经网络 +卷积神经网络融合模块将相似的广告分类为相似的广告。我们得出的结论是,这样做的原因是,越长的广告的分布与数据集更为普遍的较短广告不同。最后,我们还得出结论,需要通过堆叠的注意网络进行进一步的研究,以进一步探索视觉数据对融合模块性能的影响。

Multi-modal Record Linkage is the process of matching multi-modal records from multiple sources that represent the same entity. This field has not been explored in research and we propose two solutions based on Deep Learning architectures that are inspired by recent work in Visual Question Answering. The neural networks we propose use two different fusion modules, the Recurrent Neural Network + Convolutional Neural Network fusion module and the Stacked Attention Network fusion module, that jointly combine the visual and the textual data of the records. The output of these fusion models is the input of a Siamese Neural Network that computes the similarity of the records. Using data from the Avito Duplicate Advertisements Detection dataset, we train these solutions and from the experiments, we concluded that the Recurrent Neural Network + Convolutional Neural Network fusion module outperforms a simple model that uses hand-crafted features. We also find that the Recurrent Neural Network + Convolutional Neural Network fusion module classifies dissimilar advertisements as similar more frequently if their average description is bigger than 40 words. We conclude that the reason for this is that the longer advertisements have a different distribution then the shorter advertisements who are more prevalent in the dataset. In the end, we also conclude that further research needs to be done with the Stacked Attention Network, to further explore the effects of the visual data on the performance of the fusion modules.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源