结构监督改善了神经语言模型中几乎没有句法的学习和句法概括

论文标题

结构监督改善了神经语言模型中几乎没有句法的学习和句法概括

Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models

论文作者

Wilcox, Ethan, Qian, Peng, Futrell, Richard, Kohita, Ryosuke, Levy, Roger, Ballesteros, Miguel

论文摘要

人类可以从最低限度的经验中学习有关单词的结构属性，并在不同的语法环境中统一地部署他们学到的句法表示。我们评估现代神经语言模型在英语中重现这种行为的能力，并评估结构监督对学习成果的影响。首先，我们通过开发受控的实验来评估很少的学习能力，该实验模拟了模型的句法名称数量和言语论证结构在训练过程中被视为两次的代币的概括。其次，我们评估学习表示的不变性属性：模型将句法概括从基本上下文（例如，简单的声明性活动性句子）传递到转换的上下文（例如，疑问句）的能力。我们测试了在同一数据集上训练的四个模型：N-gram基线，一个LSTM和两个经过明确结构监督训练的LSTM Variants（Dyer等，2016； Charniak等，2016）。我们发现，在大多数情况下，神经模型能够在最小暴露后诱导适当的句法概括，通常是从训练过程中的两个示例中诱导的，并且两个结构监督模型比LSTM模型更准确地概括了。所有神经模型都能够利用在基本环境中学习的信息来推动转型环境中的期望，这表明他们已经学习了语法的一些不变性属性。

Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. First, we assess few-shot learning capabilities by developing controlled experiments that probe models' syntactic nominal number and verbal argument structure generalizations for tokens seen as few as two times during training. Second, we assess invariance properties of learned representation: the ability of a model to transfer syntactic generalizations from a base context (e.g., a simple declarative active-voice sentence) to a transformed context (e.g., an interrogative sentence). We test four models trained on the same dataset: an n-gram baseline, an LSTM, and two LSTM-variants trained with explicit structural supervision (Dyer et al.,2016; Charniak et al., 2016). We find that in most cases, the neural models are able to induce the proper syntactic generalizations after minimal exposure, often from just two examples during training, and that the two structurally supervised models generalize more accurately than the LSTM model. All neural models are able to leverage information learned in base contexts to drive expectations in transformed contexts, indicating that they have learned some invariance properties of syntax.

下载PDF全文

下载文献需遵守相关版权规定

论文标题