Bertering Rams：Bert已经对事件论点了解了多少和多少？ - 对RAMS数据集的研究

论文标题

Bertering Rams：Bert已经对事件论点了解了多少和多少？ - 对RAMS数据集的研究

BERTering RAMS: What and How Much does BERT Already Know About Event Arguments? -- A Study on the RAMS Dataset

论文作者

Gangal, Varun, Hovy, Eduard

论文摘要

使用（Clark等，2019）的基于注意力图的探测框架工作，我们观察到，在RAMS数据集（Ebner等，2020）上，Bert的注意力头具有适度但良好的能力，可以发现任何训练或领域的训练或领域捕获的事件的能力，从17.77％的低点到17.77％的位置，可用于17.77％的位置，即高于51％的51％。接下来，我们发现这些头部的线性组合估计，约占可用的总事件参数检测监督的11％，可以提高性能更高，而某些角色最高的是受害者（68.29％的精度）和伪像（准确性为58.82％）。此外，我们研究了我们的方法对跨句子事件参数的表现如何。我们提出了一个程序，将“最佳头”隔离为跨句子论证检测，分别用于句子内参数。因此，与共同估计的等效物相比，估计的头部具有优越的跨句子性能，尽管仅在我们已经知道该论点存在于其他句子中的不现实假设下。最后，我们试图隔离我们的数字在多大程度上源于基于词汇频率的相关性，而是基于黄金参数和角色之间的关联。我们提出了NONCE，这是一种通过随机生成的“ nonce”单词替换黄金参数来创建对抗性测试示例的方案。我们发现，学到的线性组合对nonce是强大的，尽管个体最佳的头部可能更敏感。

Using the attention map based probing frame-work from (Clark et al., 2019), we observe that, on the RAMS dataset (Ebner et al., 2020), BERT's attention heads have modest but well above-chance ability to spot event arguments sans any training or domain finetuning, vary-ing from a low of 17.77% for Place to a high of 51.61% for Artifact. Next, we find that linear combinations of these heads, estimated with approx 11% of available total event argument detection supervision, can push performance well-higher for some roles - highest two being Victim (68.29% Accuracy) and Artifact(58.82% Accuracy). Furthermore, we investigate how well our methods do for cross-sentence event arguments. We propose a procedure to isolate "best heads" for cross-sentence argument detection separately of those for intra-sentence arguments. The heads thus estimated have superior cross-sentence performance compared to their jointly estimated equivalents, albeit only under the unrealistic assumption that we already know the argument is present in an-other sentence. Lastly, we seek to isolate to what extent our numbers stem from lexical frequency based associations between gold arguments and roles. We propose NONCE, a scheme to create adversarial test examples by replacing gold arguments with randomly generated "nonce" words. We find that learnt linear combinations are robust to NONCE, though individual best heads can be more sensitive.

下载PDF全文

下载文献需遵守相关版权规定

论文标题