论文标题
多细分和多尺度的注意力学习,用于细粒度的视觉分类
Multi-branch and Multi-scale Attention Learning for Fine-Grained Visual Categorization
论文作者
论文摘要
Imagenet大规模视觉识别挑战(ILSVRC)是近年来计算机视觉领域(CV)中最权威的学术比赛之一。但是,将ILSVRC的年度冠军直接应用于细粒度的视觉分类(FGVC)任务并不能实现良好的性能。对于FGVC任务,小型级别的变化和大型级别的变化使其成为一个具有挑战性的问题。我们的注意对象位置模块(AOLM)可以预测对象的位置和注意部位建议模块(APPM)可以提出信息零件区域,而无需界框或零件注释。获得的对象图像不仅包含对象的整个结构,而且包含更多细节,部分图像具有许多不同的尺度和更细粒度的特征,并且原始图像包含完整的对象。这三种培训图像由我们的多分支网络监督。因此,我们的多分支和多尺度学习网络(MMAL-NET)具有良好的分类能力和鲁棒性,可用于不同尺度的图像。我们的方法可以端到端训练,同时提供短的推理时间。通过全面的实验表明,我们的方法可以在2011年CUB-20011,FGVC-AIRCRAFT和Stanford Cars数据集上实现最先进的结果。我们的代码将在https://github.com/zf10444404254/mmal-net上找到
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is one of the most authoritative academic competitions in the field of Computer Vision (CV) in recent years. But applying ILSVRC's annual champion directly to fine-grained visual categorization (FGVC) tasks does not achieve good performance. To FGVC tasks, the small inter-class variations and the large intra-class variations make it a challenging problem. Our attention object location module (AOLM) can predict the position of the object and attention part proposal module (APPM) can propose informative part regions without the need of bounding-box or part annotations. The obtained object images not only contain almost the entire structure of the object, but also contains more details, part images have many different scales and more fine-grained features, and the raw images contain the complete object. The three kinds of training images are supervised by our multi-branch network. Therefore, our multi-branch and multi-scale learning network(MMAL-Net) has good classification ability and robustness for images of different scales. Our approach can be trained end-to-end, while provides short inference time. Through the comprehensive experiments demonstrate that our approach can achieves state-of-the-art results on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets. Our code will be available at https://github.com/ZF1044404254/MMAL-Net