Activation Steering for Robust Type Prediction in CodeLLMs

Lucchetti, Francesca; Guha, Arjun

Computer Science > Computation and Language

arXiv:2404.01903v1 (cs)

[Submitted on 2 Apr 2024 (this version), latest version 13 Sep 2024 (v2)]

Title:Activation Steering for Robust Type Prediction in CodeLLMs

Authors:Francesca Lucchetti, Arjun Guha

View PDF HTML (experimental)

Abstract:Contemporary LLMs pretrained on code are capable of succeeding at a wide variety of programming tasks. However, their performance is very sensitive to syntactic features, such as the names of variables and types, the structure of code, and presence of type hints. We contribute an inference-time technique to make CodeLLMs more robust to syntactic distractors that are semantically irrelevant. Our methodology relies on activation steering, which involves editing internal model activations to steer the model towards the correct prediction. We contribute a novel way to construct steering vectors by taking inspiration from mutation testing, which constructs minimal semantics-breaking code edits. In contrast, we construct steering vectors from semantics-preserving code edits. We apply our approach to the task of type prediction for the gradually typed languages Python and TypeScript. This approach corrects up to 90% of type mispredictions. Finally, we show that steering vectors calculated from Python activations reliably correct type mispredictions in TypeScript, and vice versa. This result suggests that LLMs may be learning to transfer knowledge of types across programming languages.

Comments:	16 pages, 7 figures
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)
Cite as:	arXiv:2404.01903 [cs.CL]
	(or arXiv:2404.01903v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.01903

Submission history

From: Francesca Lucchetti [view email]
[v1] Tue, 2 Apr 2024 12:44:44 UTC (86 KB)
[v2] Fri, 13 Sep 2024 14:56:46 UTC (349 KB)

Computer Science > Computation and Language

Title:Activation Steering for Robust Type Prediction in CodeLLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Activation Steering for Robust Type Prediction in CodeLLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators