Swin-Transformer-Yolov5用于实时葡萄酒葡萄束检测

论文标题

Swin-Transformer-Yolov5用于实时葡萄酒葡萄束检测

Swin-transformer-yolov5 For Real-time Wine Grape Bunch Detection

论文作者

Lu, Shenglian, Liu, Xiaoyu, He, Zixaun, Liu, Wenbo, Zhang, Xin, Karkee, Manoj

论文摘要

在这项研究中，提出了一个集成的检测模型，即Swin-Transformer-Yolov5或Swin-T-Yolov5，用于实时葡萄酒葡萄束检测，以继承Yolov5和Swin-Transformer的优势。这项研究是针对2019年7月至9月的两种不同的霞多丽（总是白浆果皮）和梅洛（不成熟时的白色或白色混合浆果皮肤）进行的。为了验证Swin-t-Yolov5的优越性，它的性能与几个常用/有竞争力的对象检测者进行了比较。在不同的测试条件下评估了所有模型，包括两个不同的天气条件（阳光和多云），两个不同的浆果成熟度（不成熟和成熟），以及三个不同的阳光方向/强度（早晨，中午和下午）进行全面比较。此外，将Swin-t-Yolov5的预测葡萄束数量与地面真实值进行了比较，包括在注释过程中的现场手动计数和手动标记。结果表明，拟议的Swin-T-Yolov5的表现优于所有其他研究的模型用于葡萄束检测，当天气多云时，最高平均平均精度（MAP）和0.89的F1得分的97％。该地图分别比更快的R-CNN，Yolov3，Yolov4和Yolov5大约44％，18％，14％和4％。当检测到未成熟的浆果时，Swin-T-Yolov5获得了最低的地图（90％）和F1分数（0.82），其中该地图大约比相同的40％，5％，3％和1％大。此外，在将预测与地面真相进行比较时，Swin-t-Yolov5在霞多丽品种上的表现更好，R2的0.91和2.36根均方根误差（RMSE）的表现更好。但是，它在Merlot品种上的表现不佳，仅达到R2的0.70和3.30的RMSE。

In this research, an integrated detection model, Swin-transformer-YOLOv5 or Swin-T-YOLOv5, was proposed for real-time wine grape bunch detection to inherit the advantages from both YOLOv5 and Swin-transformer. The research was conducted on two different grape varieties of Chardonnay (always white berry skin) and Merlot (white or white-red mix berry skin when immature; red when matured) from July to September in 2019. To verify the superiority of Swin-T-YOLOv5, its performance was compared against several commonly used/competitive object detectors, including Faster R-CNN, YOLOv3, YOLOv4, and YOLOv5. All models were assessed under different test conditions, including two different weather conditions (sunny and cloudy), two different berry maturity stages (immature and mature), and three different sunlight directions/intensities (morning, noon, and afternoon) for a comprehensive comparison. Additionally, the predicted number of grape bunches by Swin-T-YOLOv5 was further compared with ground truth values, including both in-field manual counting and manual labeling during the annotation process. Results showed that the proposed Swin-T-YOLOv5 outperformed all other studied models for grape bunch detection, with up to 97% of mean Average Precision (mAP) and 0.89 of F1-score when the weather was cloudy. This mAP was approximately 44%, 18%, 14%, and 4% greater than Faster R-CNN, YOLOv3, YOLOv4, and YOLOv5, respectively. Swin-T-YOLOv5 achieved its lowest mAP (90%) and F1-score (0.82) when detecting immature berries, where the mAP was approximately 40%, 5%, 3%, and 1% greater than the same. Furthermore, Swin-T-YOLOv5 performed better on Chardonnay variety with achieved up to 0.91 of R2 and 2.36 root mean square error (RMSE) when comparing the predictions with ground truth. However, it underperformed on Merlot variety with achieved only up to 0.70 of R2 and 3.30 of RMSE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题