论文标题
自适应双向关注:探索机器阅读理解的多晶状体表示
Adaptive Bi-directional Attention: Exploring Multi-Granularity Representations for Machine Reading Comprehension
论文作者
论文摘要
最近,注意力增强的多层编码器(例如变压器)已在机器阅读理解(MRC)中进行了广泛的研究。为了预测答案,通常使用预测因子仅从最终的编码层中绘制信息,该层生成\ textit {croundit {粗粒}表示源序列的表示,即段落和问题。先前的研究表明,随着编码层的增加,源序列的表示从\ textit {fine grained}变为\ textit {粗粒}。人们普遍认为,随着深度神经网络中的层层层次的数量,编码过程将越来越多地收集每个位置的相关信息,从而导致更多\ textit {粗粒}表示,这增加了与其他位置相似的可能性(参考同质性)。这样的现象会误导该模型做出错误的判断以降低表现。为此,我们提出了一种称为自适应双向关注的新颖方法,该方法可自适应地利用不同级别的源代表向预测变量。基准数据集的实验结果,小队2.0证明了我们方法的有效性,并且结果优于先前的最先进模型,高于2.5 $ \%$ em和2.3 $ \%$ \%$ f1。
Recently, the attention-enhanced multi-layer encoder, such as Transformer, has been extensively studied in Machine Reading Comprehension (MRC). To predict the answer, it is common practice to employ a predictor to draw information only from the final encoder layer which generates the \textit{coarse-grained} representations of the source sequences, i.e., passage and question. Previous studies have shown that the representation of source sequence becomes more \textit{coarse-grained} from \textit{fine-grained} as the encoding layer increases. It is generally believed that with the growing number of layers in deep neural networks, the encoding process will gather relevant information for each location increasingly, resulting in more \textit{coarse-grained} representations, which adds the likelihood of similarity to other locations (referring to homogeneity). Such a phenomenon will mislead the model to make wrong judgments so as to degrade the performance. To this end, we propose a novel approach called Adaptive Bidirectional Attention, which adaptively exploits the source representations of different levels to the predictor. Experimental results on the benchmark dataset, SQuAD 2.0 demonstrate the effectiveness of our approach, and the results are better than the previous state-of-the-art model by 2.5$\%$ EM and 2.3$\%$ F1 scores.