复制Bownet：通过预测视觉单词袋的学习表示形式

论文标题

复制Bownet：通过预测视觉单词袋的学习表示形式

Reproducing BowNet: Learning Representations by Predicting Bags of Visual Words

论文作者

Nguyen, Harry, Yun, Stone, Mohammad, Hisham

论文摘要

这项工作旨在再现Gidaris等人CVPR 2020论文的结果。自我监督学习（SSL）用于使用未标记的数据集学习图像的特征表示。这项工作建议将词袋（BOW）深度特征描述符作为一个自学的学习目标，以学习强大的深层表示。当将图像的扰动版本显示为输入时，Bownet经过训练，可以重建参考图像的视觉单词（即深弓描述符）的直方图。因此，此方法旨在学习扰动 - 访问和上下文感知的图像特征，这些功能可用于几次射击任务或监督下游任务。在论文中，作者将Bownet描述为一个由卷积功能提取器$φ（\ cdot）$和一个密集的SoftMax层$ω（\ cdot）$组成的网络。弓训练后，$φ$的功能用于下游任务。在这一挑战中，我们试图建立和训练一个可以重现原始论文中报告的CIFAR-100准确性改进的网络。但是，我们在重现与作者提到的相当的准确性改进方面没有成功。这可能是针对多种因素，我们认为时间限制是主要的瓶颈。

This work aims to reproduce results from the CVPR 2020 paper by Gidaris et al. Self-supervised learning (SSL) is used to learn feature representations of an image using an unlabeled dataset. This work proposes to use bag-of-words (BoW) deep feature descriptors as a self-supervised learning target to learn robust, deep representations. BowNet is trained to reconstruct the histogram of visual words (ie. the deep BoW descriptor) of a reference image when presented a perturbed version of the image as input. Thus, this method aims to learn perturbation-invariant and context-aware image features that can be useful for few-shot tasks or supervised downstream tasks. In the paper, the author describes BowNet as a network consisting of a convolutional feature extractor $Φ(\cdot)$ and a Dense-softmax layer $Ω(\cdot)$ trained to predict BoW features from images. After BoW training, the features of $Φ$ are used in downstream tasks. For this challenge we were trying to build and train a network that could reproduce the CIFAR-100 accuracy improvements reported in the original paper. However, we were unsuccessful in reproducing an accuracy improvement comparable to what the authors mentioned. This could be for a variety of factors and we believe that time constraints were the primary bottleneck.

下载PDF全文

下载文献需遵守相关版权规定

论文标题