论文标题
丹麦文本自然阅读的眼动追踪录音的哥本哈根语料库
The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts
论文作者
论文摘要
阅读中的眼动记录是人类语言处理的最丰富信号之一。在阅读上下文化跑步文本期间,眼睛运动的语料库是使此类记录可用于自然语言处理的一种方式。这样的语料库已经以某些语言存在。我们提出了Copco,这是丹麦文本自然阅读的眼动追踪录音的哥本哈根语料库。这是丹麦语言的第一个同类语料库。 Copco包括1,832个句子,其中包含34,897个丹麦文本,这些文字从一系列语音手稿中提取。该语料库的第一个版本包含来自22名参与者的眼动跟踪数据。它将与其他类型的更多参与者和文本连续扩展。我们评估了记录的眼动的数据质量,并发现提取的特征与相关研究一致。此处可用的数据集:https://osf.io/ud8s5/。
Eye movement recordings from reading are one of the richest signals of human language processing. Corpora of eye movements during reading of contextualized running text is a way of making such records available for natural language processing purposes. Such corpora already exist in some languages. We present CopCo, the Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts. It is the first eye tracking corpus of its kind for the Danish language. CopCo includes 1,832 sentences with 34,897 tokens of Danish text extracted from a collection of speech manuscripts. This first release of the corpus contains eye tracking data from 22 participants. It will be extended continuously with more participants and texts from other genres. We assess the data quality of the recorded eye movements and find that the extracted features are in line with related research. The dataset available here: https://osf.io/ud8s5/.