培训归化语义解析器的数据很少

论文标题

培训归化语义解析器的数据很少

Training Naturalized Semantic Parsers with Very Little Data

论文作者

Rongali, Subendhu, Arkoudas, Konstantine, Rubino, Melanie, Hamza, Wael

论文摘要

语义解析是一个重要的NLP问题，特别是对于Alexa和Google Assistant等语音助手而言。最新的（SOTA）语义解析器是基于大量文本鉴定的大语言模型的SEQ2SEQ架构。为了更好地利用这一预处理，最近的工作探讨了语义解析的重新制定，从而使输出序列本身是自然语言句子，但在自然语言的受控片段中。这种方法可提供强大的结果，尤其是对于几次射击语义解析的结果，这在实践和论文的重点中至关重要。我们通过引入一种自动化方法来推动这一工作路线，该方法通过利用适量的未经通知的数据来提供非常重大的改进，这通常很容易获得。我们的方法基于一种对四种技术的新综合：具有无监督任务的联合培训；限制解码；自我训练；和释义。我们表明，这种方法在通宵数据集中提供了新的SOTA少数性能，尤其是在非常低的资源设置中，并且在新的语义解析数据集中非常引人注目的结果非常引人注目。

Semantic parsing is an important NLP problem, particularly for voice assistants such as Alexa and Google Assistant. State-of-the-art (SOTA) semantic parsers are seq2seq architectures based on large language models that have been pretrained on vast amounts of text. To better leverage that pretraining, recent work has explored a reformulation of semantic parsing whereby the output sequences are themselves natural language sentences, but in a controlled fragment of natural language. This approach delivers strong results, particularly for few-shot semantic parsing, which is of key importance in practice and the focus of our paper. We push this line of work forward by introducing an automated methodology that delivers very significant additional improvements by utilizing modest amounts of unannotated data, which is typically easy to obtain. Our method is based on a novel synthesis of four techniques: joint training with auxiliary unsupervised tasks; constrained decoding; self-training; and paraphrasing. We show that this method delivers new SOTA few-shot performance on the Overnight dataset, particularly in very low-resource settings, and very compelling few-shot results on a new semantic parsing dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题