金字塔卷积：重新思考卷积神经网络以进行视觉识别

论文标题

金字塔卷积：重新思考卷积神经网络以进行视觉识别

Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

论文作者

Duta, Ionut Cosmin, Liu, Li, Zhu, Fan, Shao, Ling

论文摘要

这项工作引入了金字塔卷积（PYCONV），该卷积能够在多个滤波器尺度上处理输入。 PYCONV包含一个核的金字塔，其中每个级别涉及不同类型的滤波器，其大小和深度不同，能够捕获场景中不同级别的细节。除了这些提高的识别能力之外，PYCONV也有效，并且通过我们的配方，与标准卷积相比，它不会增加计算成本和参数。此外，它非常灵活且可扩展，为不同应用程序提供了大量的潜在网络体系结构。 PYCONV有可能影响几乎每个计算机视觉任务，在这项工作中，我们基于PyConv提出了不同的架构，以实现视觉识别的四个主要任务：图像分类，视频动作分类/识别，对象检测和语义图像分割/解析。与基准相比，我们的方法对所有这些核心任务显示出显着改进。例如，按照图像识别，我们的50层网络在Imagenet数据集上的识别性能优于其对应物基线重新NET，具有152层，而参数却少2.39倍，计算复杂性降低了2.52倍，较低的3倍以上。在图像细分中，我们的新颖框架为具有挑战性的ADE20K基准设定了新的最新框架，以进行场景解析。代码可在以下网址找到：https：//github.com/iduta/pyconv

This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales. PyConv contains a pyramid of kernels, where each level involves different types of filters with varying size and depth, which are able to capture different levels of details in the scene. On top of these improved recognition capabilities, PyConv is also efficient and, with our formulation, it does not increase the computational cost and parameters compared to standard convolution. Moreover, it is very flexible and extensible, providing a large space of potential network architectures for different applications. PyConv has the potential to impact nearly every computer vision task and, in this work, we present different architectures based on PyConv for four main tasks on visual recognition: image classification, video action classification/recognition, object detection and semantic image segmentation/parsing. Our approach shows significant improvements over all these core tasks in comparison with the baselines. For instance, on image recognition, our 50-layers network outperforms in terms of recognition performance on ImageNet dataset its counterpart baseline ResNet with 152 layers, while having 2.39 times less parameters, 2.52 times lower computational complexity and more than 3 times less layers. On image segmentation, our novel framework sets a new state-of-the-art on the challenging ADE20K benchmark for scene parsing. Code is available at: https://github.com/iduta/pyconv

下载PDF全文

下载文献需遵守相关版权规定

论文标题