论文标题
复制Bownet:通过预测视觉单词袋的学习表示形式
Reproducing BowNet: Learning Representations by Predicting Bags of Visual Words
论文作者
论文摘要
这项工作旨在再现Gidaris等人CVPR 2020论文的结果。自我监督学习(SSL)用于使用未标记的数据集学习图像的特征表示。这项工作建议将词袋(BOW)深度特征描述符作为一个自学的学习目标,以学习强大的深层表示。当将图像的扰动版本显示为输入时,Bownet经过训练,可以重建参考图像的视觉单词(即深弓描述符)的直方图。因此,此方法旨在学习扰动 - 访问和上下文感知的图像特征,这些功能可用于几次射击任务或监督下游任务。在论文中,作者将Bownet描述为一个由卷积功能提取器$φ(\ cdot)$和一个密集的SoftMax层$ω(\ cdot)$组成的网络。弓训练后,$φ$的功能用于下游任务。在这一挑战中,我们试图建立和训练一个可以重现原始论文中报告的CIFAR-100准确性改进的网络。但是,我们在重现与作者提到的相当的准确性改进方面没有成功。这可能是针对多种因素,我们认为时间限制是主要的瓶颈。
This work aims to reproduce results from the CVPR 2020 paper by Gidaris et al. Self-supervised learning (SSL) is used to learn feature representations of an image using an unlabeled dataset. This work proposes to use bag-of-words (BoW) deep feature descriptors as a self-supervised learning target to learn robust, deep representations. BowNet is trained to reconstruct the histogram of visual words (ie. the deep BoW descriptor) of a reference image when presented a perturbed version of the image as input. Thus, this method aims to learn perturbation-invariant and context-aware image features that can be useful for few-shot tasks or supervised downstream tasks. In the paper, the author describes BowNet as a network consisting of a convolutional feature extractor $Φ(\cdot)$ and a Dense-softmax layer $Ω(\cdot)$ trained to predict BoW features from images. After BoW training, the features of $Φ$ are used in downstream tasks. For this challenge we were trying to build and train a network that could reproduce the CIFAR-100 accuracy improvements reported in the original paper. However, we were unsuccessful in reproducing an accuracy improvement comparable to what the authors mentioned. This could be for a variety of factors and we believe that time constraints were the primary bottleneck.