MDMLP：带有MLP的小数据集的图像分类

论文标题

MDMLP：带有MLP的小数据集的图像分类

MDMLP: Image Classification from Scratch on Small Datasets with MLP

论文作者

Lv, Tian, Bai, Chongyang, Wang, Chaojie

论文摘要

注意机制已成为自然语言处理和计算机视觉任务的首选技术。最近，与CNN和注意力技术相比，基于多层感知器（MLP）的MLP混合和其他基于MLP的体系结构也很强大，并提出了新的研究方向。但是，与视觉变压器（VIT）或Convnets相比，基于MLP的网络的高能力严重依赖大量的训练数据，并且缺乏解释能力。在小型数据集中接受培训时，它们通常比Convnet取得较低的结果。为了解决它，我们提出（i）多维MLP（MDMLP），这是一种基于概念的简单且基于MLP的轻巧架构，但在小型数据集中从头开始训练时，可以实现SOTA；（ii）多维MLP注意工具（MDATTNTOOL），这是一种基于MLP的新颖而有效的注意机制。即使没有强大的数据增强，MDMLP也只有0.30万参数，在CIFAR10上达到90.90％的精度，而众所周知的MLP-Mixer在1710万参数的情况下达到了85.45％。此外，轻巧的Mdattntool突出了图像中的对象，表明其解释能力。我们的代码可在https://github.com/amoza-theodore/mdmlp上找到。

The attention mechanism has become a go-to technique for natural language processing and computer vision tasks. Recently, the MLP-Mixer and other MLP-based architectures, based simply on multi-layer perceptrons (MLPs), are also powerful compared to CNNs and attention techniques and raises a new research direction. However, the high capability of the MLP-based networks severely relies on large volume of training data, and lacks of explanation ability compared to the Vision Transformer (ViT) or ConvNets. When trained on small datasets, they usually achieved inferior results than ConvNets. To resolve it, we present (i) multi-dimensional MLP (MDMLP), a conceptually simple and lightweight MLP-based architecture yet achieves SOTA when training from scratch on small-size datasets; (ii) multi-dimension MLP Attention Tool (MDAttnTool), a novel and efficient attention mechanism based on MLPs. Even without strong data augmentation, MDMLP achieves 90.90% accuracy on CIFAR10 with only 0.3M parameters, while the well-known MLP-Mixer achieves 85.45% with 17.1M parameters. In addition, the lightweight MDAttnTool highlights objects in images, indicating its explanation power. Our code is available at https://github.com/Amoza-Theodore/MDMLP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题