论文标题

FS-Coco:旨在理解上下文中常见对象的徒手草图

FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

论文作者

Chowdhury, Pinaki Nath, Sain, Aneeshan, Bhunia, Ayan Kumar, Xiang, Tao, Gryaditskaya, Yulia, Song, Yi-Zhe

论文摘要

我们使用徒手场景草图FS-Coco的第一个数据集将草图研究推向了场景。考虑到实用的应用,我们收集的草图很好地传达了场景内容,但可以在几分钟之内由具有素描技巧的人勾勒出来。我们的数据集包含10,000个徒手场景向量素描,每点时空信息由100个非专家个人提供,提供对象和场景级抽象。每个草图都用其文本描述增强。使用我们的数据集,我们首次研究了从徒手场景草图和草图标题中检索细粒度图像的问题。我们在以下内容上获取:(i)使用笔触的时间顺序在草图中编码的场景显着性; (ii)从场景草图和图像标题中进行图像检索的性能比较; (iii)素描和图像标题中信息的互补性,以及结合两种方式的潜在优势。此外,我们扩展了一个流行的矢量草图基于LSTM的编码器,以处理比以前的工作所支持的更复杂性的草图。也就是说,我们提出了一个层次草图解码器,我们将其通过特定于草图的“预文本”任务利用。我们的数据集可用于首次研究徒手场景素描理解及其实际应用。

We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO. With practical applications in mind, we collect sketches that convey scene content well but can be sketched within a few minutes by a person with any sketching skills. Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals, offering both object- and scene-level abstraction. Each sketch is augmented with its text description. Using our dataset, we study for the first time the problem of fine-grained image retrieval from freehand scene sketches and sketch captions. We draw insights on: (i) Scene salience encoded in sketches using the strokes temporal order; (ii) Performance comparison of image retrieval from a scene sketch and an image caption; (iii) Complementarity of information in sketches and image captions, as well as the potential benefit of combining the two modalities. In addition, we extend a popular vector sketch LSTM-based encoder to handle sketches with larger complexity than was supported by previous work. Namely, we propose a hierarchical sketch decoder, which we leverage at a sketch-specific "pre-text" task. Our dataset enables for the first time research on freehand scene sketch understanding and its practical applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源