LayoutDetr：检测变压器是一个不错的多式模式布局设计师

论文标题

LayoutDetr：检测变压器是一个不错的多式模式布局设计师

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer

论文作者

Yu, Ning, Chen, Chia-Chih, Chen, Zeyuan, Meng, Rui, Wu, Gang, Josel, Paul, Niebles, Juan Carlos, Xiong, Caiming, Xu, Ran

论文摘要

图形布局设计在视觉交流中起着至关重要的作用。然而，手工制作布局设计是技巧的，耗时的，并且不适合批量生产。生成模型出现以使设计自动化可扩展，但生产符合设计师的多模式欲望的设计，即受背景图像的限制并受到前景内容的驱动。我们提出了从生成建模中继承高质量和现实主义的LayoutDetr，同时将内容感知要求重新提出为检测问题：我们学会在背景图像中检测出合理的位置，尺度和空间关系，以在布局中为多模式前景元素进行空间关系。我们的解决方案为公共基准和新策划的广告横幅数据集设定了新的最新性能，以实现布局生成。我们将解决方案集成到一个图形系统中，该系统有助于用户研究，并表明用户更喜欢我们的设计而不是基线，而不是大幅度的利润。代码，模型，数据集和演示可在https://github.com/salesforce/layoutdetr上找到。

Graphic layout designs play an essential role in visual communication. Yet handcrafting layout designs is skill-demanding, time-consuming, and non-scalable to batch production. Generative models emerge to make design automation scalable but it remains non-trivial to produce designs that comply with designers' multimodal desires, i.e., constrained by background images and driven by foreground content. We propose LayoutDETR that inherits the high quality and realism from generative modeling, while reformulating content-aware requirements as a detection problem: we learn to detect in a background image the reasonable locations, scales, and spatial relations for multimodal foreground elements in a layout. Our solution sets a new state-of-the-art performance for layout generation on public benchmarks and on our newly-curated ad banner dataset. We integrate our solution into a graphical system that facilitates user studies, and show that users prefer our designs over baselines by significant margins. Code, models, dataset, and demos are available at https://github.com/salesforce/LayoutDETR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题