论文标题
序列级别的扬声器变更检测与基于差的连续集成与火
Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
论文作者
论文摘要
说话者更改检测是多方互动(例如会议和对话)中的重要任务。在本文中,我们从序列转导的角度解决了说话者更改检测任务。具体而言,我们提出了一个新颖的编码器框架,该框架将输入特征序列直接转换为说话者身份序列。基于差异的连续集成和火力机制旨在支持该框架。它通过根据检测到的扬声器更改在编码器输出之间整合扬声器的差异来检测说话者的变化。整个框架是由说话者身份序列监督的,标签比精确的扬声器更改点更弱。 AMI和Dihard-I Corpora上的实验表明,我们的序列级方法始终优于使用精确的扬声器更改标签的强帧级基线。
Speaker change detection is an important task in multi-party interactions such as meetings and conversations. In this paper, we address the speaker change detection task from the perspective of sequence transduction. Specifically, we propose a novel encoder-decoder framework that directly converts the input feature sequence to the speaker identity sequence. The difference-based continuous integrate-and-fire mechanism is designed to support this framework. It detects speaker changes by integrating the speaker difference between the encoder outputs frame-by-frame and transfers encoder outputs to segment-level speaker embeddings according to the detected speaker changes. The whole framework is supervised by the speaker identity sequence, a weaker label than the precise speaker change points. The experiments on the AMI and DIHARD-I corpora show that our sequence-level method consistently outperforms a strong frame-level baseline that uses the precise speaker change labels.