在机器阅读理解中以人类视觉关注来解释注意力模型

论文标题

在机器阅读理解中以人类视觉关注来解释注意力模型

Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension

论文作者

Sood, Ekta, Tannert, Simon, Frassinelli, Diego, Bulling, Andreas, Vu, Ngoc Thang

论文摘要

尽管具有注意机制的神经网络在许多自然语言处理任务上取得了出色的表现，但尚不清楚在多大程度上学到的注意力类似于人类的视觉关注。在本文中，我们提出了一种新方法，该方法利用眼睛追踪数据来研究机器阅读理解中人类视觉关注与神经关注之间的关系。为此，我们介绍了一本小说的23个参与者眼睛跟踪数据集-MQA-RC，参与者在其中阅读了电影图并回答了预定义的问题。我们比较基于长期短期记忆（LSTM），卷积神经模型（CNN）和XLNET变形金刚体系结构的艺术网络状态。我们发现，与人类注意力和绩效的相似性更高，与LSTM和CNN模型显着相关。但是，尽管XLNET在这项具有挑战性的任务上表现最好，但我们表明，对于XLNET模型而言，这种关系并不成立。我们的结果表明，不同的架构似乎学习了相当不同的神经关注策略，而神经与人类关注的相似性并不能保证最佳性能。

While neural networks with attention mechanisms have achieved superior performance on many natural language processing tasks, it remains unclear to which extent learned attention resembles human visual attention. In this paper, we propose a new method that leverages eye-tracking data to investigate the relationship between human visual attention and neural attention in machine reading comprehension. To this end, we introduce a novel 23 participant eye tracking dataset - MQA-RC, in which participants read movie plots and answered pre-defined questions. We compare state of the art networks based on long short-term memory (LSTM), convolutional neural models (CNN) and XLNet Transformer architectures. We find that higher similarity to human attention and performance significantly correlates to the LSTM and CNN models. However, we show this relationship does not hold true for the XLNet models -- despite the fact that the XLNet performs best on this challenging task. Our results suggest that different architectures seem to learn rather different neural attention strategies and similarity of neural to human attention does not guarantee best performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题