论文标题
串行扬声器:电视连续剧数据集
Serial Speakers: a Dataset of TV Series
论文作者
论文摘要
十多年来,电视连续剧一直引起观众和各个学术领域的越来越多的兴趣。但是,尽管大多数观看者都迷上了电视连续剧的持续图,但研究人员可供研究的少数注释数据集,重点介绍古典电视连续剧的独立情节。我们的目标是通过向多媒体/语音处理社区提供串行扬声器,这是一个带有带注释的数据集,该数据集的三个流行美国电视连续剧:Breaking Bad,《权力的游戏》和《纸牌屋》。串行扬声器既适合在现实的用例场景中调查多媒体检索,又适合在尤其具有挑战性的条件下解决较低级别的语音相关任务。我们在每个演讲转弯(边界,演讲者)和场景边界上公开释放注释,以及射击边界,重复镜头和互动演讲者的注释。由于具有版权限制,因此在数据集的公共版本中加密了语音转弯的文本内容,但是我们为用户提供了一个简单的在线工具,可以从其自己的字幕文件中恢复纯文本。
For over a decade, TV series have been drawing increasing interest, both from the audience and from various academic fields. But while most viewers are hooked on the continuous plots of TV serials, the few annotated datasets available to researchers focus on standalone episodes of classical TV series. We aim at filling this gap by providing the multimedia/speech processing communities with Serial Speakers, an annotated dataset of 161 episodes from three popular American TV serials: Breaking Bad, Game of Thrones and House of Cards. Serial Speakers is suitable both for investigating multimedia retrieval in realistic use case scenarios, and for addressing lower level speech related tasks in especially challenging conditions. We publicly release annotations for every speech turn (boundaries, speaker) and scene boundary, along with annotations for shot boundaries, recurring shots, and interacting speakers in a subset of episodes. Because of copyright restrictions, the textual content of the speech turns is encrypted in the public version of the dataset, but we provide the users with a simple online tool to recover the plain text from their own subtitle files.