论文标题
教学伯特等待:平衡流频率检测的准确性和延迟
Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming Disfluency Detection
论文作者
论文摘要
在基于现代互动语音的系统中,语音在消除爆发之前会逐渐消耗和转录。此后处理步骤对于在下游任务(例如机器翻译)上产生干净的成绩单和高性能至关重要。但是,大多数当前最新的NLP模型(例如变压器)在非核心上运行,可能导致不可接受的延迟。我们提出了一个基于BERT的流序序列标记模型,该模型结合了一个新颖的训练目标,能够实时检测出偏见,同时平衡准确性和潜伏期。这是通过训练模型来决定是否立即输出当前输入预测还是等待进一步上下文来实现这一目标。本质上,该模型学会了动态大小的lookahead窗口。我们的结果表明,我们的模型会产生相当准确的预测,并且比我们的基线更快,并且闪烁较低。此外,与最近在增量不足检测方面的工作相比,该模型可以达到最先进的潜伏能力和稳定得分。
In modern interactive speech-based systems, speech is consumed and transcribed incrementally prior to having disfluencies removed. This post-processing step is crucial for producing clean transcripts and high performance on downstream tasks (e.g. machine translation). However, most current state-of-the-art NLP models such as the Transformer operate non-incrementally, potentially causing unacceptable delays. We propose a streaming BERT-based sequence tagging model that, combined with a novel training objective, is capable of detecting disfluencies in real-time while balancing accuracy and latency. This is accomplished by training the model to decide whether to immediately output a prediction for the current input or to wait for further context. Essentially, the model learns to dynamically size its lookahead window. Our results demonstrate that our model produces comparably accurate predictions and does so sooner than our baselines, with lower flicker. Furthermore, the model attains state-of-the-art latency and stability scores when compared with recent work on incremental disfluency detection.