可读性评估的语言特征

论文标题

可读性评估的语言特征

Linguistic Features for Readability Assessment

论文作者

Deutsch, Tovly, Jasbi, Masoud, Shieber, Stuart

论文摘要

可读性评估旨在自动将文本分类为适合学习读者的级别。该任务的传统方法利用了各种语言动机的功能与简单的机器学习模型配对。最近的方法通过丢弃这些功能并利用深度学习模型来提高性能。但是，尚不清楚以语言动机的特征增强深度学习模型是否会进一步提高性能。本文结合了这两种方法，目的是改善整体模型性能并解决此问题。评估两个大型可读性语料库，我们发现，鉴于足够的培训数据，以语言动机的功能增强深度学习模型并不能提高最新性能。我们的结果提供了以下假设的初步证据：最先进的深度学习模型代表与可读性有关的文本的语言特征。这些模型中形成的表示性质的未来研究可能会阐明博学的特征及其与传统方法中假设的语言动机的特征。

Readability assessment aims to automatically classify text by the level appropriate for learning readers. Traditional approaches to this task utilize a variety of linguistically motivated features paired with simple machine learning models. More recent methods have improved performance by discarding these features and utilizing deep learning models. However, it is unknown whether augmenting deep learning models with linguistically motivated features would improve performance further. This paper combines these two approaches with the goal of improving overall model performance and addressing this question. Evaluating on two large readability corpora, we find that, given sufficient training data, augmenting deep learning models with linguistically motivated features does not improve state-of-the-art performance. Our results provide preliminary evidence for the hypothesis that the state-of-the-art deep learning models represent linguistic features of the text related to readability. Future research on the nature of representations formed in these models can shed light on the learned features and their relations to linguistically motivated ones hypothesized in traditional approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题