论文标题

绝对决策绝对腐败:保守的在线演讲者诊断

Absolute decision corrupts absolutely: conservative online speaker diarisation

论文作者

Kwon, Youngki, Heo, Hee-Soo, Lee, Bong-Jin, Kim, You Jin, Jung, Jee-weon

论文摘要

我们的重点在于开发一个在线扬声器诊断框架,该框架展示了各个领域的稳健性能。在在线扬声器诊断中,实时产生的输出是不可逆的,在输入会议的早期阶段,一些错误判断会导致灾难性的结果。我们假设,在许多其他因素中,谨慎增加估计的说话者的数量至关重要。因此,我们提出的框架包括当系统法官认为过去的增加是错误的时,将说话者的数量减少。我们还采用双缓冲区,检查点和质心,其中检查站与轮廓系数结合使用,以估计说话者的数量和质心代表扬声器。同样,我们认为可以从一个发言人那里产生多个质心。因此,我们设计了一种基于聚类的标签匹配技术来实时分配标签。最终的系统轻巧但出奇的有效。该系统在Dihard 2和3数据集上展示了最先进的性能,在AMI和VoxConverse测试集中,它也具有竞争力。

Our focus lies in developing an online speaker diarisation framework which demonstrates robust performance across diverse domains. In online speaker diarisation, outputs generated in real-time are irreversible, and a few misjudgements in the early phase of an input session can lead to catastrophic results. We hypothesise that cautiously increasing the number of estimated speakers is of paramount importance among many other factors. Thus, our proposed framework includes decreasing the number of speakers by one when the system judges that an increase in the past was faulty. We also adopt dual buffers, checkpoints and centroids, where checkpoints are combined with silhouette coefficients to estimate the number of speakers and centroids represent speakers. Again, we believe that more than one centroid can be generated from one speaker. Thus we design a clustering-based label matching technique to assign labels in real-time. The resulting system is lightweight yet surprisingly effective. The system demonstrates state-of-the-art performance on DIHARD 2 and 3 datasets, where it is also competitive in AMI and VoxConverse test sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源