论文标题
带有像素的混合跟踪器和用于视频全景分割的实例
Hybrid Tracker with Pixel and Instance for Video Panoptic Segmentation
论文作者
论文摘要
视频全景细分(VPS)旨在生成相干的泛群分段,并跟踪视频帧中所有像素的身份。现有方法主要利用受过训练的实例嵌入来保持泛型分割的一致性。但是,他们不可避免地难以应付小物体的挑战,相似的外观,但身份不一致,遮挡和强大的实例轮廓变形。为了解决这些问题,我们提出了HybridTracker,这是一种轻巧和关节跟踪模型,试图消除单个跟踪器的局限性。 HybridTracker并行执行像素跟踪器和实例跟踪器,以获得融合到匹配矩阵中的关联矩阵。在实例跟踪器中,我们设计了一个可区分的匹配层,以确保框架间匹配的稳定性。在Pixel Tracker中,我们计算出估计的光流的不同帧的相同实例的骰子系数,从而形成了通过联合(IOU)矩阵的相交。我们还提出了推断期间相互检查和时间一致性约束,以解决遮挡和轮廓变形挑战。全面的实验表明,HybridTracker在CityScapes-VPS和VIPER数据集上的最先进方法的性能优于性能。
Video Panoptic Segmentation (VPS) aims to generate coherent panoptic segmentation and track the identities of all pixels across video frames. Existing methods predominantly utilize the trained instance embedding to keep the consistency of panoptic segmentation. However, they inevitably struggle to cope with the challenges of small objects, similar appearance but inconsistent identities, occlusion, and strong instance contour deformations. To address these problems, we present HybridTracker, a lightweight and joint tracking model attempting to eliminate the limitations of the single tracker. HybridTracker performs pixel tracker and instance tracker in parallel to obtain the association matrices, which are fused into a matching matrix. In the instance tracker, we design a differentiable matching layer, ensuring the stability of inter-frame matching. In the pixel tracker, we compute the dice coefficient of the same instance of different frames given the estimated optical flow, forming the Intersection Over Union (IoU) matrix. We additionally propose mutual check and temporal consistency constraints during inference to settle the occlusion and contour deformation challenges. Comprehensive experiments show that HybridTracker achieves superior performance than state-of-the-art methods on Cityscapes-VPS and VIPER datasets.