论文标题
文本和模式:为了有效的思想链,探戈需要两个
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango
论文作者
论文摘要
在过去的十年中,自然语言处理和大型语言模型的前所未有的缩放都取得了巨大的收益。通过诸如思想链(COT)提示之类的少量技术的出现,这些事态发展得到了加速。具体而言,COT通过使用中间步骤来增强提示,从而在几次弹奏设置中推动了大型语言模型的性能。尽管在各种任务中取得了令人印象深刻的结果,但尚未探讨其成功背后的原因。这项工作使用反事实提示,对大型语言模型中的基于COT的少量发射提示机制有了更深入的了解。我们首先系统地识别并定义提示的关键组成部分:符号,模式和文本。然后,我们通过使用反事实提示来查询模型,在只有其中一个更改这些组件的情况下,我们就可以在四个不同的任务上进行详尽的实验。我们在三种模型(Palm,GPT-3和Codex)上进行的实验揭示了几个令人惊讶的发现,并引起了围绕几个弹药提示的传统智慧的质疑。首先,提示中事实模式的存在实际上与COT的成功并不重要。其次,我们的结果得出的结论是,中间步骤的主要作用可能不是促进学习如何解决任务。中间步骤是该模型实现在输出中复制哪些符号以形成事实答案的标志。此外,文字具有常识性知识和意义。我们的经验和定性分析表明,文本和模式之间的共生关系解释了很少的提示的成功:文本有助于从问题中提取常识,以帮助模式,并强制执行任务理解和直接文本生成。
The past decade has witnessed dramatic gains in natural language processing and an unprecedented scaling of large language models. These developments have been accelerated by the advent of few-shot techniques such as chain of thought (CoT) prompting. Specifically, CoT pushes the performance of large language models in a few-shot setup by augmenting the prompts with intermediate steps. Despite impressive results across various tasks, the reasons behind their success have not been explored. This work uses counterfactual prompting to develop a deeper understanding of CoT-based few-shot prompting mechanisms in large language models. We first systematically identify and define the key components of a prompt: symbols, patterns, and text. Then, we devise and conduct an exhaustive set of experiments across four different tasks, by querying the model with counterfactual prompts where only one of these components is altered. Our experiments across three models (PaLM, GPT-3, and CODEX) reveal several surprising findings and brings into question the conventional wisdom around few-shot prompting. First, the presence of factual patterns in a prompt is practically immaterial to the success of CoT. Second, our results conclude that the primary role of intermediate steps may not be to facilitate learning how to solve a task. The intermediate steps are rather a beacon for the model to realize what symbols to replicate in the output to form a factual answer. Further, text imbues patterns with commonsense knowledge and meaning. Our empirical and qualitative analysis reveals that a symbiotic relationship between text and patterns explains the success of few-shot prompting: text helps extract commonsense from the question to help patterns, and patterns enforce task understanding and direct text generation.