论文标题

学习检测和细分以进行开放词汇对象检测

Learning to Detect and Segment for Open Vocabulary Object Detection

论文作者

Wang, Tao, Li, Nan

论文摘要

近期的视觉预识别模型的开发有助于识别仅具有语义类别的新对象,因此大大提出了开放词汇对象检测。先前的作品主要集中于知识转移到对象建议分类并采用类不足的框和掩盖预测。在这项工作中,我们提出了一种原则性的动态网络设计Condhead,以更好地概括盒子回归和掩盖分段以进行开放的词汇设置。核心思想是有条件地将网络嵌入到语义嵌入的网络头上,因此该模型以特定于类的知识为指导,以更好地检测新型类别。具体而言,Condhead由两个网络头流组成,动态聚合的头部和动态生成的头部。前者是通过有条件汇总的一组静态头进行实例化的,这些头部被优化为专家,并有望学习复杂的预测。后者通过动态生成的参数实例化,并编码一般类别的信息。通过这种有条件的设计,检测模型被语义嵌入桥接,以提供强烈的概括式盒子和蒙版预测。我们的方法给最先进的开放词汇对象检测方法带来了显着改进,例如,它的开销非常小,例如,它在新型类别上超过了3.0检测AP的区域CLIP模型,计算仅1.1%。

Open vocabulary object detection has been greatly advanced by the recent development of vision-language pretrained model, which helps recognize novel objects with only semantic categories. The prior works mainly focus on knowledge transferring to the object proposal classification and employ class-agnostic box and mask prediction. In this work, we propose CondHead, a principled dynamic network design to better generalize the box regression and mask segmentation for open vocabulary setting. The core idea is to conditionally parameterize the network heads on semantic embedding and thus the model is guided with class-specific knowledge to better detect novel categories. Specifically, CondHead is composed of two streams of network heads, the dynamically aggregated head and the dynamically generated head. The former is instantiated with a set of static heads that are conditionally aggregated, these heads are optimized as experts and are expected to learn sophisticated prediction. The latter is instantiated with dynamically generated parameters and encodes general class-specific information. With such a conditional design, the detection model is bridged by the semantic embedding to offer strongly generalizable class-wise box and mask prediction. Our method brings significant improvement to the state-of-the-art open vocabulary object detection methods with very minor overhead, e.g., it surpasses a RegionClip model by 3.0 detection AP on novel categories, with only 1.1% more computation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源