In-Context Learning with Representations: Contextual Generalization of Trained Transformers

Yang, Tong; Huang, Yu; Liang, Yingbin; Chi, Yuejie

Computer Science > Machine Learning

arXiv:2408.10147 (cs)

[Submitted on 19 Aug 2024 (v1), last revised 25 Sep 2024 (this version, v2)]

Title:In-Context Learning with Representations: Contextual Generalization of Trained Transformers

Authors:Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi

View PDF HTML (experimental)

Abstract:In-context learning (ICL) refers to a remarkable capability of pretrained large language models, which can learn a new task given a few examples during inference. However, theoretical understanding of ICL is largely under-explored, particularly whether transformers can be trained to generalize to unseen examples in a prompt, which will require the model to acquire contextual knowledge of the prompt for generalization. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks. The contextual generalization here can be attained via learning the template function for each task in-context, where all template functions lie in a linear space with $m$ basis functions. We analyze the training dynamics of one-layer multi-head transformers to in-contextly predict unlabeled inputs given partially labeled prompts, where the labels contain Gaussian noise and the number of examples in each prompt are not sufficient to determine the template. Under mild assumptions, we show that the training loss for a one-layer multi-head transformer converges linearly to a global minimum. Moreover, the transformer effectively learns to perform ridge regression over the basis functions. To our knowledge, this study is the first provable demonstration that transformers can learn contextual (i.e., template) information to generalize to both unseen examples and tasks when prompts contain only a small number of query-answer pairs.

Comments:	Accepted by NeurIPS 2024
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2408.10147 [cs.LG]
	(or arXiv:2408.10147v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2408.10147

Submission history

From: Yu Huang [view email]
[v1] Mon, 19 Aug 2024 16:47:46 UTC (492 KB)
[v2] Wed, 25 Sep 2024 19:16:16 UTC (492 KB)

Computer Science > Machine Learning

Title:In-Context Learning with Representations: Contextual Generalization of Trained Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:In-Context Learning with Representations: Contextual Generalization of Trained Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators