论文标题
从数字欺骗的角度对自然语言生成的全面调查
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception
论文作者
论文摘要
近年来,旨在生成模仿人类语言流利性和连贯性的文本的系统能力的实质性增长。由此,有大量的研究旨在检查这些自然语言发生器(NLG)对许多任务的潜在用途。强大的文本生成器的能力越来越多地模仿人类写作,这令人信服地提高了欺骗和其他形式的危险滥用的潜力。随着这些系统的改善,很难区分人文编写和机器生成的文本,恶意演员可以利用这些强大的NLG系统将这些功能强大的NLG系统带到各种各样的目的,包括创建假新闻和虚假的错误信息,产生虚假的在线产品评论,或通过Chandbots作为说服用户剥夺私人信息的手段。在本文中,我们通过对NLG研究的119条类似调查的论文进行识别和考察提供了NLG领域的概述。从这些已确定的论文中,我们概述了构成NLG的中心概念的提议的高级分类法,包括用于开发广义NLG系统的方法,评估了这些系统的方法以及流行的NLG任务和存在的子任务和子任务。反过来,我们就当前的研究提供了对这些项目的概述和讨论,并提供了NLG在欺骗和检测系统中的潜在作用以抵消这些威胁的潜在作用。此外,我们讨论了NLG的更广泛挑战,包括现有文本生成系统经常表现出的偏见风险。这项工作对NLG领域的滥用潜力提供了广泛的概述,旨在提供对这一快速发展的研究领域的高级了解。
In recent years there has been substantial growth in the capabilities of systems designed to generate text that mimics the fluency and coherence of human language. From this, there has been considerable research aimed at examining the potential uses of these natural language generators (NLG) towards a wide number of tasks. The increasing capabilities of powerful text generators to mimic human writing convincingly raises the potential for deception and other forms of dangerous misuse. As these systems improve, and it becomes ever harder to distinguish between human-written and machine-generated text, malicious actors could leverage these powerful NLG systems to a wide variety of ends, including the creation of fake news and misinformation, the generation of fake online product reviews, or via chatbots as means of convincing users to divulge private information. In this paper, we provide an overview of the NLG field via the identification and examination of 119 survey-like papers focused on NLG research. From these identified papers, we outline a proposed high-level taxonomy of the central concepts that constitute NLG, including the methods used to develop generalised NLG systems, the means by which these systems are evaluated, and the popular NLG tasks and subtasks that exist. In turn, we provide an overview and discussion of each of these items with respect to current research and offer an examination of the potential roles of NLG in deception and detection systems to counteract these threats. Moreover, we discuss the broader challenges of NLG, including the risks of bias that are often exhibited by existing text generation systems. This work offers a broad overview of the field of NLG with respect to its potential for misuse, aiming to provide a high-level understanding of this rapidly developing area of research.