部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

论文作者

Dinh, Tuan, Zeng, Yuchen, Zhang, Ruisu, Lin, Ziqian, Gira, Michael, Rajput, Shashank, Sohn, Jy-yong, Papailiopoulos, Dimitris, Lee, Kangwook

论文摘要

无需进行任何体系结构更改而无需进行微调的语言模型（LMS）已成为学习下游任务的各种语言的规范。但是，对于非语言下游任务，一种常见的做法是使用特定于任务的设计来实现输入，输出层和损失功能。例如，可以通过用图像补丁嵌入层替换单词嵌入层，具有10向输出层的单词令牌输出层以及分别以10向分类损失的单词预测丢失来微调LM中的MNIST分类器。出现一个自然的问题：LM微调可以在不更改模型架构或损失功能的情况下解决非语言下游任务吗？为了回答这一点，我们提出了语言交织的微调（LIFT），并通过对非语言分类和回归任务进行广泛的经验研究来研究其功效和局限性。 LIFT不会对模型架构或损失功能进行任何更改，它仅依赖于自然语言界面，从而使“使用LMS进行无代码机”学习。我们发现，在各种低维分类和回归任务中，升力效果相当出色，在许多情况下，尤其是分类任务，匹配最佳基础线的性能。我们还报告了有关升力的基本特性的实验结果，包括电感偏差，鲁棒性和样品复杂性。我们还分析了预处理对升力的影响以及通过适当提示，校准的预测，数据生成和两阶段的微调来提起的一些特定的属性/技术，例如上下文感知的学习。我们的代码可从https://github.com/uw-madison-lee-lab/languageinterfacefacefacefinetuning获得。

Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding layer with an image patch embedding layer, the word token output layer with a 10-way output layer, and the word prediction loss with a 10-way classification loss, respectively. A natural question arises: Can LM fine-tuning solve non-language downstream tasks without changing the model architecture or loss function? To answer this, we propose Language-Interfaced Fine-Tuning (LIFT) and study its efficacy and limitations by conducting an extensive empirical study on a suite of non-language classification and regression tasks. LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling "no-code machine learning with LMs." We find that LIFT performs comparably well across a wide range of low-dimensional classification and regression tasks, matching the performances of the best baselines in many cases, especially for the classification tasks. We also report experimental results on the fundamental properties of LIFT, including inductive bias, robustness, and sample complexity. We also analyze the effect of pretraining on LIFT and a few properties/techniques specific to LIFT, e.g., context-aware learning via appropriate prompting, calibrated predictions, data generation, and two-stage fine-tuning. Our code is available at https://github.com/UW-Madison-Lee-Lab/LanguageInterfacedFineTuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题