羊乳酪：专门为专家任务应用程序提供基础模型

论文标题

羊乳酪：专门为专家任务应用程序提供基础模型

FETA: Towards Specializing Foundation Models for Expert Task Applications

论文作者

Alfassy, Amit, Arbelle, Assaf, Halimi, Oshri, Harary, Sivan, Herzig, Roei, Schwartz, Eli, Panda, Rameswar, Dolfi, Michele, Auer, Christoph, Saenko, Kate, Staar, PeterW. J., Feris, Rogerio, Karlinsky, Leonid

论文摘要

基金会模型（FMS）已证明了前所未有的功能，包括零射击学习，高保真数据综合和范围内的概括。但是，正如我们在本文中所显示的那样，FMS在专家任务上的开箱即用表现较差（例如，从语言查询中检索汽车手册技术插图），这些数据是看不见的，或属于用于FM预训练的大型数据的长尾数据分布的长尾数据分布的长尾部分。这强调了在此类专家任务上明确评估和芬太尼FMS的必要性，这可以说是在实际现实世界中最重要的任务。在本文中，我们提出了围绕教学FMS了解技术文档的任务而建立的第一个同类基准，通过学习将其图形插图与相应的语言描述相匹配。我们的Feta基准重点介绍了公共车手册和销售目录手册中的文本对图像和图像到文本检索。 FETA配备了一个完全自动注释提取的程序（将在接受后发布代码），从而使Feta轻松扩展到将来更多的文档类型和应用程序域。我们的自动注释会导致自动化性能度量标准，该指标与在人类策划注释中计算的指标一致（也发布）。我们提供多个基线和对FETA的流行FM的分析，从而导致了一些有趣的发现，我们认为这对FM社区非常有价值，为现实世界应用FMS用于当前由关注共同对象的标准基准“忽略”的实践专家任务的实践专家任务铺平了道路。

Foundation Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization. However, as we show in this paper, FMs still have poor out-of-the-box performance on expert tasks (e.g. retrieval of car manuals technical illustrations from language queries), data for which is either unseen or belonging to a long-tail part of the data distribution of the huge datasets used for FM pre-training. This underlines the necessity to explicitly evaluate and finetune FMs on such expert tasks, arguably ones that appear the most in practical real-world applications. In this paper, we propose a first of its kind FETA benchmark built around the task of teaching FMs to understand technical documentation, via learning to match their graphical illustrations to corresponding language descriptions. Our FETA benchmark focuses on text-to-image and image-to-text retrieval in public car manuals and sales catalogue brochures. FETA is equipped with a procedure for completely automatic annotation extraction (code would be released upon acceptance), allowing easy extension of FETA to more documentation types and application domains in the future. Our automatic annotation leads to an automated performance metric shown to be consistent with metrics computed on human-curated annotations (also released). We provide multiple baselines and analysis of popular FMs on FETA leading to several interesting findings that we believe would be very valuable to the FM community, paving the way towards real-world application of FMs for practical expert tasks currently 'overlooked' by standard benchmarks focusing on common objects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题