先验和偏见：新手审阅者对会议同行评审中的重新提交的偏见

论文标题

先验和偏见：新手审阅者对会议同行评审中的重新提交的偏见

Prior and Prejudice: The Novice Reviewers' Bias against Resubmissions in Conference Peer Review

论文作者

Stelmakh, Ivan, Shah, Nihar B., Singh, Aarti, Daumé III, Hal

论文摘要

现代的机器学习和计算机科学会议正在激增，随着有能力的审稿人的数量增长的速度较慢，这会挑战同行评审质量的质量。为了遏制这一趋势并减轻审稿人的负担，几次会议已经开始鼓励甚至要求作者宣布其论文的先前提交历史。作者之间已经达到了这种倡议，他们引起了人们对本信息引起的审阅者建议的潜在偏见的关注。在这项工作中，我们调查了审稿人是否表现出偏见是由于以前在类似的地点被拒绝的审查提交的知识引起的偏见，重点是在领先的机器学习和计算机科学会议中占很大一部分的新手审稿人。我们设计并进行了一项随机对照试验，密切复制了同行评审管道的相关组件，其中$ 133 $审阅者（硕士，初级博士生和最近的美国大学毕业生）以$ 19 $论文的方式撰写评论。该分析表明，当审稿人收到有关纸张是重新提交的信号时，实际上会变得负有负面偏见，在10分李克特项目上的总分数近1点（$δ= -0.78，\ 95 \％\％\％\ \ \ \ \ \ \ text {ci} = [-1.30，-0.24] $比未收到信号的评论者。查看特定的标准得分（原创性，质量，清晰度和意义），我们观察到新手审稿人倾向于低估质量。

Modern machine learning and computer science conferences are experiencing a surge in the number of submissions that challenges the quality of peer review as the number of competent reviewers is growing at a much slower rate. To curb this trend and reduce the burden on reviewers, several conferences have started encouraging or even requiring authors to declare the previous submission history of their papers. Such initiatives have been met with skepticism among authors, who raise the concern about a potential bias in reviewers' recommendations induced by this information. In this work, we investigate whether reviewers exhibit a bias caused by the knowledge that the submission under review was previously rejected at a similar venue, focusing on a population of novice reviewers who constitute a large fraction of the reviewer pool in leading machine learning and computer science conferences. We design and conduct a randomized controlled trial closely replicating the relevant components of the peer-review pipeline with $133$ reviewers (master's, junior PhD students, and recent graduates of top US universities) writing reviews for $19$ papers. The analysis reveals that reviewers indeed become negatively biased when they receive a signal about paper being a resubmission, giving almost 1 point lower overall score on a 10-point Likert item ($Δ= -0.78, \ 95\% \ \text{CI} = [-1.30, -0.24]$) than reviewers who do not receive such a signal. Looking at specific criteria scores (originality, quality, clarity and significance), we observe that novice reviewers tend to underrate quality the most.

下载PDF全文

下载文献需遵守相关版权规定

论文标题