Stereovoxelnet：使用深神经网络基于立体相机的占用体素的实时障碍物检测

论文标题

Stereovoxelnet：使用深神经网络基于立体相机的占用体素的实时障碍物检测

StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels from a Stereo Camera Using Deep Neural Networks

论文作者

Li, Hongyu, Li, Zhengang, Akmandor, Neset Unver, Jiang, Huaizu, Wang, Yanzhi, Padir, Taskin

论文摘要

障碍物检测是机器人导航中的一个安全问题，即立体声匹配是一种流行的基于视觉的方法。尽管深度神经网络在计算机视觉中显示出令人印象深刻的结果，但以前的大多数障碍物检测都仅利用传统的立体声匹配技术来满足实时反馈的计算限制。本文提出了一种计算高效的方法，该方法采用了深层神经网络直接从立体声图像中检测占用率。我们的方法没有从立体声数据中学习点云对应，而是根据体积表示提取紧凑的障碍物分布。此外，我们根据解码器产生的OCTREES以粗到1的方式修剪安全空间的计算。结果，我们在机载计算机上实现实时性能（NVIDIA JETSON TX2）。我们的方法可检测到32米的范围准确的障碍，并以最先进的立体声模型的计算成本的2％的计算成本获得了更好的IOU（相交）和CD（倒角距离）。此外，我们通过使用真正的机器人进行自主导航实验来验证方法的鲁棒性和现实世界的可行性。因此，我们的工作有助于缩小机器人感知中基于立体声的系统与计算机视觉中最新的立体声模型之间的差距。为了应对高质量的现实世界立体声数据集的稀缺性，我们用移动机器人收集一个1.36小时的立体声数据集，该数据集用于微调我们的模型。数据集，代码和更多详细信息，包括其他可视化，请访问https://lhy.xyz/stereovoxelnet

Obstacle detection is a safety-critical problem in robot navigation, where stereo matching is a popular vision-based approach. While deep neural networks have shown impressive results in computer vision, most of the previous obstacle detection works only leverage traditional stereo matching techniques to meet the computational constraints for real-time feedback. This paper proposes a computationally efficient method that employs a deep neural network to detect occupancy from stereo images directly. Instead of learning the point cloud correspondence from the stereo data, our approach extracts the compact obstacle distribution based on volumetric representations. In addition, we prune the computation of safety irrelevant spaces in a coarse-to-fine manner based on octrees generated by the decoder. As a result, we achieve real-time performance on the onboard computer (NVIDIA Jetson TX2). Our approach detects obstacles accurately in the range of 32 meters and achieves better IoU (Intersection over Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of the state-of-the-art stereo model. Furthermore, we validate our method's robustness and real-world feasibility through autonomous navigation experiments with a real robot. Hence, our work contributes toward closing the gap between the stereo-based system in robot perception and state-of-the-art stereo models in computer vision. To counter the scarcity of high-quality real-world indoor stereo datasets, we collect a 1.36 hours stereo dataset with a mobile robot which is used to fine-tune our model. The dataset, the code, and further details including additional visualizations are available at https://lhy.xyz/stereovoxelnet

下载PDF全文

下载文献需遵守相关版权规定

论文标题