论文标题
在复杂的驾驶场景中仅根据每个类带一个带注释的像素来实现像素级的语义学习
Realizing Pixel-Level Semantic Learning in Complex Driving Scenes based on Only One Annotated Pixel per Class
论文作者
论文摘要
已经提出了基于弱监督条件的语义细分任务,以实现轻巧的标签过程。对于仅包含几类的简单图像,基于图像级注释的研究已获得可接受的性能。但是,当面对复杂场景时,由于图像包含大量类,因此很难根据图像标签学习视觉外观。在这种情况下,图像级注释在提供信息方面无效。因此,我们设置了一个新任务,其中仅为每个类别提供一个带注释的像素。基于更轻巧和信息性的条件,为伪标签生成构建了三个步骤,该过程逐步实施了每个类别,基于图像推理和基于上下文位置的精炼的最佳特征表示。特别是,由于在驾驶场景下,高级语义和低级成像功能对每个类都具有不同的判别能力,因此我们将每个类别分为“对象”或“场景”,然后分别为两种类型提供不同的操作。此外,建立了替代迭代结构以逐渐改善分割性能,该结合结合了基于CNN的间图共同的语义学习和成像先验的基于图像内图像内的修改过程。 CityScapes数据集的实验表明,所提出的方法提供了一种在复杂的驾驶场景下解决弱监督的语义细分任务的可行方法。
Semantic segmentation tasks based on weakly supervised condition have been put forward to achieve a lightweight labeling process. For simple images that only include a few categories, researches based on image-level annotations have achieved acceptable performance. However, when facing complex scenes, since image contains a large amount of classes, it becomes difficult to learn visual appearance based on image tags. In this case, image-level annotations are not effective in providing information. Therefore, we set up a new task in which only one annotated pixel is provided for each category. Based on the more lightweight and informative condition, a three step process is built for pseudo labels generation, which progressively implement optimal feature representation for each category, image inference and context-location based refinement. In particular, since high-level semantics and low-level imaging feature have different discriminative ability for each class under driving scenes, we divide each category into "object" or "scene" and then provide different operations for the two types separately. Further, an alternate iterative structure is established to gradually improve segmentation performance, which combines CNN-based inter-image common semantic learning and imaging prior based intra-image modification process. Experiments on Cityscapes dataset demonstrate that the proposed method provides a feasible way to solve weakly supervised semantic segmentation task under complex driving scenes.