Torchsparse：有效的点云推理引擎

论文标题

Torchsparse：有效的点云推理引擎

TorchSparse: Efficient Point Cloud Inference Engine

论文作者

Tang, Haotian, Liu, Zhijian, Li, Xiuyu, Lin, Yujun, Han, Song

论文摘要

由于其在AR/VR和自动驾驶中的广泛应用，对Point Clouds的深入学习引起了人们的关注。这些应用需要低延迟和高精度，以提供实时用户体验并确保用户安全。与常规密集的工作量不同，点云的稀疏和不规则性质对在通用硬件上有效地运行稀疏的CNN构成了严重的挑战。此外，2D图像的现有稀疏加速技术不会转化为3D点云。在本文中，我们介绍了Torchsparse，这是一种高性能点云推断引擎，可加速GPU上的稀疏卷积计算。 Torchsparse直接优化了稀疏卷积的两个瓶颈：不规则的计算和数据运动。它将自适应矩阵乘法分组应用于贸易计算，以获得更好的规律性，实现1.4-1.5倍的矩阵乘法加速。它还通过采用矢量化，量化和融合的位置感知内存访问来优化数据移动，从而将内存运动成本降低2.7倍。在三个基准数据集中评估了七个代表性模型，Torchsparse分别在最先进的Minkowskiengine和SPCONV上实现了1.6倍和1.5倍的端到端速度。

Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety. Unlike conventional dense workloads, the sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently on the general-purpose hardware. Furthermore, existing sparse acceleration techniques for 2D images do not translate to 3D point clouds. In this paper, we introduce TorchSparse, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement. It applies adaptive matrix multiplication grouping to trade computation for better regularity, achieving 1.4-1.5x speedup for matrix multiplication. It also optimizes the data movement by adopting vectorized, quantized and fused locality-aware memory access, reducing the memory movement cost by 2.7x. Evaluated on seven representative models across three benchmark datasets, TorchSparse achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题