论文标题
从Web数据中生成无监督的多标签数据集
Unsupervised Multi-label Dataset Generation from Web Data
论文作者
论文摘要
本文以无监督的方式从Web数据中提出了一个从Web数据生成多标签数据集的系统。为了实现这一目标,这项工作包括两个主要贡献,即:a)从Web-data中生成低噪声无监督的单标签数据集,b)该数据集中的标签增强(从单个标签到多标签)。单标签数据集的生成使用了无监督的降噪阶段(使用锚点的聚类和簇选择)获得了85%的正确标记的图像。然后执行无监督的标签增强过程,以使用类激活图和与每个类关联的不确定性分配新标签为数据集中的图像。此过程应用于本文生成的数据集和一个公共数据集(Place365)分别在每个数据集中达到9.5%和27%的额外标签,因此证明所呈现的系统可以强大地丰富初始数据集。
This paper presents a system towards the generation of multi-label datasets from web data in an unsupervised manner. To achieve this objective, this work comprises two main contributions, namely: a) the generation of a low-noise unsupervised single-label dataset from web-data, and b) the augmentation of labels in such dataset (from single label to multi label). The generation of a single-label dataset uses an unsupervised noise reduction phase (clustering and selection of clusters using anchors) obtaining a 85% of correctly labeled images. An unsupervised label augmentation process is then performed to assign new labels to the images in the dataset using the class activation maps and the uncertainty associated with each class. This process is applied to the dataset generated in this paper and a public dataset (Places365) achieving a 9.5% and 27% of extra labels in each dataset respectively, therefore demonstrating that the presented system can robustly enrich the initial dataset.