对自动事实验证系统的合成虚假信息攻击

论文标题

对自动事实验证系统的合成虚假信息攻击

Synthetic Disinformation Attacks on Automated Fact Verification Systems

论文作者

Du, Yibing, Bosselut, Antoine, Manning, Christopher D.

论文摘要

自动化事实检查是限制在线错误信息传播所需的技术。此类解决方案的一个当前框架建议通过从相关的文本来源中检索支持或反驳证据来验证主张。但是，事实检查者的现实用例将需要对可能受相同错误信息影响的证据来源进行验证。此外，可以开发可以产生连贯的，捏造的内容的现代NLP工具的开发将使恶意演员可以系统地产生对事实检查者的对抗性迷恋。在这项工作中，我们探讨了在两个模拟设置中自动事实检查器对合成对抗证据的敏感性：对抗性方案，我们在其中构建文档并将其添加到事实检查系统可用的证据存储库中，以及对逆境变化的证据，并在存储库中进行现有证据源文档，自然而然地更改了。我们跨三个基准测试的多个模型的研究表明，这些系统在这些攻击方面遭受了显着的性能下降。最后，我们讨论了现代NLG系统的日益增长的威胁，因为他们对自动化事实检查者构成的挑战的背景是虚假信息的发生。

Automated fact-checking is a needed technology to curtail the spread of online misinformation. One current framework for such solutions proposes to verify claims by retrieving supporting or refuting evidence from related textual sources. However, the realistic use cases for fact-checkers will require verifying claims against evidence sources that could be affected by the same misinformation. Furthermore, the development of modern NLP tools that can produce coherent, fabricated content would allow malicious actors to systematically generate adversarial disinformation for fact-checkers. In this work, we explore the sensitivity of automated fact-checkers to synthetic adversarial evidence in two simulated settings: AdversarialAddition, where we fabricate documents and add them to the evidence repository available to the fact-checking system, and AdversarialModification, where existing evidence source documents in the repository are automatically altered. Our study across multiple models on three benchmarks demonstrates that these systems suffer significant performance drops against these attacks. Finally, we discuss the growing threat of modern NLG systems as generators of disinformation in the context of the challenges they pose to automated fact-checkers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题