论文标题
一项关于眼睛跟踪,注释和语言模型的样式文本显着性的比较研究
A Comparative Study on Textual Saliency of Styles from Eye Tracking, Annotations, and Language Models
论文作者
论文摘要
人们对将眼睛跟踪数据和其他人类语言处理的其他隐性措施纳入自然语言处理(NLP)管道越来越兴趣。人类语言处理的数据包含对人类语言理解的独特见解,语言模型可以利用。但是,关于此数据的性质以及如何最好地用于下游NLP任务,许多未解决的问题仍然存在。在本文中,我们介绍了眼神,这是一种用于人类文本文本(例如礼貌)的眼神跟踪数据集。我们开发了多种方法,可以使用收集的眼睛数据集来通过文本得出样式显着性得分。我们进一步研究了这种显着性数据如何与人类注释方法和基于模型的可解释性指标进行比较。我们发现,虽然吸引人的数据是唯一的,但它也与人类注释和基于模型的重要性分数相交,从而提供了基于人类和机器的观点之间的桥梁。我们建议利用这种类型的数据评估解释样式模型的认知合理性。我们的引人注目的数据和处理代码公开可用。
There is growing interest in incorporating eye-tracking data and other implicit measures of human language processing into natural language processing (NLP) pipelines. The data from human language processing contain unique insight into human linguistic understanding that could be exploited by language models. However, many unanswered questions remain about the nature of this data and how it can best be utilized in downstream NLP tasks. In this paper, we present eyeStyliency, an eye-tracking dataset for human processing of stylistic text (e.g., politeness). We develop a variety of methods to derive style saliency scores over text using the collected eye dataset. We further investigate how this saliency data compares to both human annotation methods and model-based interpretability metrics. We find that while eye-tracking data is unique, it also intersects with both human annotations and model-based importance scores, providing a possible bridge between human- and machine-based perspectives. We propose utilizing this type of data to evaluate the cognitive plausibility of models that interpret style. Our eye-tracking data and processing code are publicly available.