在小数据方案中对创意语言特征进行排名

论文标题

在小数据方案中对创意语言特征进行排名

Ranking Creative Language Characteristics in Small Data Scenarios

论文作者

Siekiera, Julia, Köppel, Marius, Simpson, Edwin, Stowe, Kevin, Gurevych, Iryna, Kramer, Stefan

论文摘要

对创意自然语言进行排名的能力为下游语言理解和产生提供了重要的一般工具。但是，当前的深度排名模型需要大量的标记数据，这些数据很难且昂贵，对于不同的领域，语言和创造性特征。最新的神经方法，即DirectRanker，有望减少所需的培训数据量，但其应用程序的应用并未充分探索。因此，我们调整了DirectRanker，为使用小数据对创意语言进行排名。我们将DirectRanker与贝叶斯方法，高斯过程偏好学习（GPPL）进行了比较，该方法先前已被证明与稀疏数据合作。我们对稀疏培训数据进行的实验表明，尽管标准神经排名方法的性能与小型培训数据集崩溃了，但DirectRanker仍然有效。我们发现，将DirectRanker与GPPL相结合，通过利用这两种模型的互补益处来提高不同环境的性能。我们的合并方法的表现优于先前关于幽默和隐喻新颖任务的最先进，使Spearman的$ρ$平均增加了14％，平均增加了16％。

The ability to rank creative natural language provides an important general tool for downstream language understanding and generation. However, current deep ranking models require substantial amounts of labeled data that are difficult and expensive to obtain for different domains, languages and creative characteristics. A recent neural approach, the DirectRanker, promises to reduce the amount of training data needed but its application to text isn't fully explored. We therefore adapt the DirectRanker to provide a new deep model for ranking creative language with small data. We compare DirectRanker with a Bayesian approach, Gaussian process preference learning (GPPL), which has previously been shown to work well with sparse data. Our experiments with sparse training data show that while the performance of standard neural ranking approaches collapses with small training datasets, DirectRanker remains effective. We find that combining DirectRanker with GPPL increases performance across different settings by leveraging the complementary benefits of both models. Our combined approach outperforms the previous state-of-the-art on humor and metaphor novelty tasks, increasing Spearman's $ρ$ by 14% and 16% on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题