使用人工神经网络对图形开放式响应的自动评分

论文标题

使用人工神经网络对图形开放式响应的自动评分

Automated Scoring of Graphical Open-Ended Responses Using Artificial Neural Networks

论文作者

von Davier, Matthias, Tyack, Lillian, Khorramdel, Lale

论文摘要

免费的图纸或图像作为响应的自动评分尚未用于对学生成就的大规模评估。在这项研究中，我们提出了人工神经网络，以对基于计算机的国际数学和科学评估的这些类型的图形响应进行分类。我们正在比较卷积和前馈方法的分类精度。我们的结果表明，卷积神经网络（CNN）在损失和准确性方面的表现都优于前馈神经网络。 CNN模型分类为97.71％的图像响应分为适当的评分类别，与典型的人类评估者相比，这与典型的人类评估者相当。通过观察到，最准确的CNN模型正确地对人类评估者进行了错误评分的一些图像响应的观察进一步加强了这些发现。作为另一项创新，我们概述了一种根据项目响应理论得出的预期响应函数的应用来选择培训样本的人类额定响应的方法。本文认为，基于CNN的图像响应的自动评分是一个高度准确的程序，可以潜在地替代第二个人类评估者的工作量和成本进行大规模评估，同时提高了评分复杂的构造响应项目的有效性和可比性。

Automated scoring of free drawings or images as responses has yet to be utilized in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a computer based international mathematics and science assessment. We are comparing classification accuracy of convolutional and feedforward approaches. Our results show that convolutional neural networks (CNNs) outperform feedforward neural networks in both loss and accuracy. The CNN models classified up to 97.71% of the image responses into the appropriate scoring category, which is comparable to, if not more accurate, than typical human raters. These findings were further strengthened by the observation that the most accurate CNN models correctly classified some image responses that had been incorrectly scored by the human raters. As an additional innovation, we outline a method to select human rated responses for the training sample based on an application of the expected response function derived from item response theory. This paper argues that CNN-based automated scoring of image responses is a highly accurate procedure that could potentially replace the workload and cost of second human raters for large scale assessments, while improving the validity and comparability of scoring complex constructed-response items.

下载PDF全文

下载文献需遵守相关版权规定

论文标题