通过文本到文本变压器和增强学习来探索流利的查询重新进行

论文标题

通过文本到文本变压器和增强学习来探索流利的查询重新进行

Exploring Fluent Query Reformulations with Text-to-Text Transformers and Reinforcement Learning

论文作者

Chen, Jerry Zikun, Yu, Shi, Wang, Haoran

论文摘要

查询重新印象旨在将嘈杂或模棱两可的文本序列改变为更接近自然语言问题的连贯序列。这是为了防止错误在面向客户的管道中传播并促进与用户更好的沟通。此外，在向下游环境中保持性能，例如何时给出重新询问的询问，至关重要。我们表明，在先前的框架（AQA）下，尝试更改RL算法并不会为奖励获取或序列流利带来重大好处。取而代之的是，我们利用查询重新构造的文本到文本变压器（QRT5）并应用基于策略的RL算法来进一步推动此重新构造器，并通过产生奖励获得奖励的查询轨迹来获得更好的答案。 QRT5在RL中显示出更好的样品效率，以达到与以前的方法相同的质量检查性能。它可以基于查询良好的评估来生成具有更高可读性的重新制定，并可以推广到样本外数据。证明我们的框架是灵活的，可以从不同的下游环境（例如意向分类）中采购奖励信号。

Query reformulation aims to alter noisy or ambiguous text sequences into coherent ones closer to natural language questions. This is to prevent errors from propagating in a client-facing pipeline and promote better communication with users. Besides, it is crucial to maintain performance in downstream environments like question answering when rephrased queries are given as input. We show that under the previous framework (AQA), attempts to alter RL algorithms do not bring significant benefits to either reward acquisition or sequence fluency. Instead, we leverage a query-reformulating text-to-text transformer (QRT5) and apply policy-based RL algorithms to further nudge this reformulator and obtain better answers downstream by generating reward-acquiring query trajectories. QRT5 shows better sample efficiency in RL to achieve the same level of QA performance as the previous approach. It can generate reformulations with more readability based on query well-formedness evaluations and can generalize to out-of-sample data. Our framework is demonstrated to be flexible, allowing reward signals to be sourced from different downstream environments such as intent classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题