抽象视觉推理的多标签对比度学习

论文标题

抽象视觉推理的多标签对比度学习

Multi-Label Contrastive Learning for Abstract Visual Reasoning

论文作者

Małkiński, Mikołaj, Mańdziuk, Jacek

论文摘要

长期以来，解决抽象推理任务的能力被认为是人类智能的标志之一。与许多其他领域一样，深度学习方法（DL）方法的最新进展引起了超越人类抽象推理的性能，特别是在最受欢迎的此类问题类型中 - 乌鸦的渐进式矩阵（RPMS）。尽管DL系统的功效确实令人印象深刻，但它们接近RPM的方式与人类的疗效截然不同。解决RPM的最先进的系统依赖于大规模的基于模式的培训，有时还依靠数据集中的偏见，而人类集中于确定RPM（或通常是视觉推理任务）的规则 /概念。这项工作是在这种认知差异的推动下，旨在将DL与人类解决RPM的方式结合在一起，并获得两全其美。具体而言，我们将求解RPM的问题投入到多标签分类框架中，其中每个RPM都被视为一个多标签数据点，标签由RPM基础的抽象规则集确定。为了有效训练系统，我们将噪声对比估计算法的概括引入了多标签样品的情况。此外，我们提出了一种针对RPM的新的稀疏规则编码方案，除了新的培训算法外，这是促成最新性能的关键因素。在两个最受欢迎的基准数据集（平衡范围和PGM）上评估了所提出的方法，并且两种方法都证明了比当前最新结果的优势。与其他领域中报道的对比度学习方法的应用相反，本文中报道的最新性能是实现的，而不需要大批量大小或强大的数据增强。

For a long time the ability to solve abstract reasoning tasks was considered one of the hallmarks of human intelligence. Recent advances in application of deep learning (DL) methods led, as in many other domains, to surpassing human abstract reasoning performance, specifically in the most popular type of such problems - the Raven's Progressive Matrices (RPMs). While the efficacy of DL systems is indeed impressive, the way they approach the RPMs is very different from that of humans. State-of-the-art systems solving RPMs rely on massive pattern-based training and sometimes on exploiting biases in the dataset, whereas humans concentrate on identification of the rules / concepts underlying the RPM (or generally a visual reasoning task) to be solved. Motivated by this cognitive difference, this work aims at combining DL with human way of solving RPMs and getting the best of both worlds. Specifically, we cast the problem of solving RPMs into multi-label classification framework where each RPM is viewed as a multi-label data point, with labels determined by the set of abstract rules underlying the RPM. For efficient training of the system we introduce a generalisation of the Noise Contrastive Estimation algorithm to the case of multi-label samples. Furthermore, we propose a new sparse rule encoding scheme for RPMs which, besides the new training algorithm, is the key factor contributing to the state-of-the-art performance. The proposed approach is evaluated on two most popular benchmark datasets (Balanced-RAVEN and PGM) and on both of them demonstrates an advantage over the current state-of-the-art results. Contrary to applications of contrastive learning methods reported in other domains, the state-of-the-art performance reported in the paper is achieved with no need for large batch sizes or strong data augmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题