结合语义指导和深入的强化学习来产生人类水平绘画

论文标题

结合语义指导和深入的强化学习来产生人类水平绘画

Combining Semantic Guidance and Deep Reinforcement Learning For Generating Human Level Paintings

论文作者

Singh, Jaskirat, Zheng, Liang

论文摘要

在计算机视觉社区中，基于中风的非遗物图像是一个重要的问题。作为朝着这个方向发展的努力，最近的大量研究工作集中在教学机器“如何绘画”上，以类似于人类画家的方式。但是，先前方法的适用性仅限于数据集，而前景对象的位置，比例和显着性很小。结果，我们发现这些方法难以涵盖现实世界图像所拥有的粒度和多样性。为此，我们提出了一条语义指导管道，其中1）一个双层绘画程序，用于在训练时学习前景和背景笔触之间的区别。 2）我们还通过神经对齐模型将前景对象的位置和尺度引入不变性，该模型将对象本地化和空间变压器网络端到端方式组合在一起，以缩小到特定的语义实例。 3）然后通过最大化基于引导反向传播的新型焦点奖励来扩大对焦对象的区别特征。所提出的代理不需要对人的卒中数据进行任何监督，并成功处理前景对象属性的变化，因此，为Cub-200 Birds和Stanford Cars-196数据集生成了更高质量的帆布。最后，我们通过评估在挑战性的虚拟Kitti数据集上的扩展，在具有多个前景对象实例的复杂数据集上证明了我们的方法的进一步疗效。源代码和模型可在https://github.com/1jsingh/semantic-guidance上找到。

Generation of stroke-based non-photorealistic imagery, is an important problem in the computer vision community. As an endeavor in this direction, substantial recent research efforts have been focused on teaching machines "how to paint", in a manner similar to a human painter. However, the applicability of previous methods has been limited to datasets with little variation in position, scale and saliency of the foreground object. As a consequence, we find that these methods struggle to cover the granularity and diversity possessed by real world images. To this end, we propose a Semantic Guidance pipeline with 1) a bi-level painting procedure for learning the distinction between foreground and background brush strokes at training time. 2) We also introduce invariance to the position and scale of the foreground object through a neural alignment model, which combines object localization and spatial transformer networks in an end to end manner, to zoom into a particular semantic instance. 3) The distinguishing features of the in-focus object are then amplified by maximizing a novel guided backpropagation based focus reward. The proposed agent does not require any supervision on human stroke-data and successfully handles variations in foreground object attributes, thus, producing much higher quality canvases for the CUB-200 Birds and Stanford Cars-196 datasets. Finally, we demonstrate the further efficacy of our method on complex datasets with multiple foreground object instances by evaluating an extension of our method on the challenging Virtual-KITTI dataset. Source code and models are available at https://github.com/1jsingh/semantic-guidance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题