论文标题
交互式图像操纵与复杂的文本说明
Interactive Image Manipulation with Complex Text Instructions
论文作者
论文摘要
最近,由于其高灵活性和可控性,文本指导的图像操纵在多媒体处理和计算机视觉的研究领域受到了越来越多的关注。它的目标是根据文本描述对输入参考图像的一部分进行操纵。但是,大多数现有作品都有以下问题:(1)无法始终保持文本 - iRrelevant内容,但随机更改;(2)图像操纵的性能仍然需要进一步改进,(3)只能操纵描述性属性。为了解决这些问题,我们提出了一种新型的图像操纵方法,该方法使用复杂的文本指令进行交互式编辑图像。它不仅允许用户提高图像操纵的准确性,还可以完成复杂的任务,例如扩大,减少或删除对象并用输入图像替换背景。为了使这些任务成为可能,我们采用三种策略。首先,给定的图像分为文本相关的内容和文本含量。仅操纵与文本相关的内容,并且可以维护文本含量的内容。其次,使用超分辨率方法来扩大操纵区域,以进一步提高可操作性并帮助操纵对象本身。第三,引入了用户界面,以交互编辑分段图,以根据用户的需求重新修改生成的图像。在上下文(MS Coco)数据集中,对Caltech-UCSD Birds-200-2011(CUB)数据集和Microsoft公共对象进行了广泛的实验,证明我们提出的方法可以实时实时,灵活和准确的图像操作。通过定性和定量评估,我们表明所提出的模型的表现优于其他最新方法。
Recently, text-guided image manipulation has received increasing attention in the research field of multimedia processing and computer vision due to its high flexibility and controllability. Its goal is to semantically manipulate parts of an input reference image according to the text descriptions. However, most of the existing works have the following problems: (1) text-irrelevant content cannot always be maintained but randomly changed, (2) the performance of image manipulation still needs to be further improved, (3) only can manipulate descriptive attributes. To solve these problems, we propose a novel image manipulation method that interactively edits an image using complex text instructions. It allows users to not only improve the accuracy of image manipulation but also achieve complex tasks such as enlarging, dwindling, or removing objects and replacing the background with the input image. To make these tasks possible, we apply three strategies. First, the given image is divided into text-relevant content and text-irrelevant content. Only the text-relevant content is manipulated and the text-irrelevant content can be maintained. Second, a super-resolution method is used to enlarge the manipulation region to further improve the operability and to help manipulate the object itself. Third, a user interface is introduced for editing the segmentation map interactively to re-modify the generated image according to the user's desires. Extensive experiments on the Caltech-UCSD Birds-200-2011 (CUB) dataset and Microsoft Common Objects in Context (MS COCO) datasets demonstrate our proposed method can enable interactive, flexible, and accurate image manipulation in real-time. Through qualitative and quantitative evaluations, we show that the proposed model outperforms other state-of-the-art methods.