论文标题
可靠的标签boottapping,用于半监督学习
Reliable Label Bootstrapping for Semi-Supervised Learning
论文作者
论文摘要
减少训练卷积神经网络所需的标签数量而不会降低性能是有效减少人类注释工作的关键。我们提出了可靠的标签引导(Relab),这是一种无监督的预培养算法,可在极低的监督设置中提高半监督算法的性能。给定一个较少标记样本的数据集,我们首先学习了数据的有意义的自我监管,潜在功能。其次,标签传播算法在无监督的功能上传播已知的标签,以自动方式有效地标记完整数据集。第三,我们使用标签噪声检测算法选择正确标记(可靠)样品的子集。最后,我们在扩展子集上训练一种半监督算法。我们表明,网络体系结构和自我监督算法的选择是实现成功标签传播的重要因素,并证明在CIFAR-10,CIFAR-100,CIFAR-100和MINI-IMIMAGENET的非常有限的监督方案中,重新介绍重大改善了半监督的学习。我们达到$ \ boldsymbol {22.34} $的平均错误率,当CIFAR-10上的每个类带有1个随机标记的样本,当每个类标记的样本具有很高的代表性时,该错误降低到$ \ boldsymbol {8.46} $。我们的工作是完全可重现的:https://github.com/paulalbert31/relab。
Reducing the amount of labels required to train convolutional neural networks without performance degradation is key to effectively reduce human annotation efforts. We propose Reliable Label Bootstrapping (ReLaB), an unsupervised preprossessing algorithm which improves the performance of semi-supervised algorithms in extremely low supervision settings. Given a dataset with few labeled samples, we first learn meaningful self-supervised, latent features for the data. Second, a label propagation algorithm propagates the known labels on the unsupervised features, effectively labeling the full dataset in an automatic fashion. Third, we select a subset of correctly labeled (reliable) samples using a label noise detection algorithm. Finally, we train a semi-supervised algorithm on the extended subset. We show that the selection of the network architecture and the self-supervised algorithm are important factors to achieve successful label propagation and demonstrate that ReLaB substantially improves semi-supervised learning in scenarios of very limited supervision on CIFAR-10, CIFAR-100 and mini-ImageNet. We reach average error rates of $\boldsymbol{22.34}$ with 1 random labeled sample per class on CIFAR-10 and lower this error to $\boldsymbol{8.46}$ when the labeled sample in each class is highly representative. Our work is fully reproducible: https://github.com/PaulAlbert31/ReLaB.