通过视频检索和功能生成的概括性视频分类

论文标题

通过视频检索和功能生成的概括性视频分类

Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation

论文作者

Xian, Yongqin, Korbar, Bruno, Douze, Matthijs, Torresani, Lorenzo, Schiele, Bernt, Akata, Zeynep

论文摘要

很少有学习旨在从几个例子中识别出新的课程。尽管在图像域中取得了重大进展，但相对尚未探索的视频分类很少。我们认为以前的方法低估了视频特征学习的重要性，并建议使用3D CNN学习时空特征。提出了一种两阶段的方法，该方法在基础课上学习视频功能，然后在小型课程上微调分类器，我们表明，这种简单的基线方法的表现优于先前的几次视频分类方法，在现有基准测试中超过20点。为了避免需要标记的示例，我们提出了两种新的方法，可以进一步改善。首先，我们使用标签检索利用来自大数据集的标签标签视频，然后选择具有视觉相似性的最佳剪辑。其次，我们学习生成的对抗网络，这些网络从其语义嵌入中生成新颖类的视频功能。此外，我们发现现有的基准是有限的，因为它们仅专注于每个测试情节中的5个新颖类，并通过涉及更多新颖的课程，即几乎没有射击学习以及新颖和基础类的混合物，即概括的几乎没有学术学习，从而引入了更现实的基准。实验结果表明，我们的检索和特征生成方法在新基准测试中的基线方法显着优于基线方法。

Few-shot learning aims to recognize novel classes from a few examples. Although significant progress has been made in the image domain, few-shot video classification is relatively unexplored. We argue that previous methods underestimate the importance of video feature learning and propose to learn spatiotemporal features using a 3D CNN. Proposing a two-stage approach that learns video features on base classes followed by fine-tuning the classifiers on novel classes, we show that this simple baseline approach outperforms prior few-shot video classification methods by over 20 points on existing benchmarks. To circumvent the need of labeled examples, we present two novel approaches that yield further improvement. First, we leverage tag-labeled videos from a large dataset using tag retrieval followed by selecting the best clips with visual similarities. Second, we learn generative adversarial networks that generate video features of novel classes from their semantic embeddings. Moreover, we find existing benchmarks are limited because they only focus on 5 novel classes in each testing episode and introduce more realistic benchmarks by involving more novel classes, i.e. few-shot learning, as well as a mixture of novel and base classes, i.e. generalized few-shot learning. The experimental results show that our retrieval and feature generation approach significantly outperform the baseline approach on the new benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题