Understanding How CodeLLMs (Mis)Predict Types with Activation Steering

Lucchetti, Francesca; Guha, Arjun

Computer Science > Computation and Language

arXiv:2404.01903 (cs)

[Submitted on 2 Apr 2024 (v1), last revised 13 Sep 2024 (this version, v2)]

Title:Understanding How CodeLLMs (Mis)Predict Types with Activation Steering

Authors:Francesca Lucchetti, Arjun Guha

View PDF HTML (experimental)

Abstract:CodeLLMs are transforming software development as we know it. This is especially true for tasks where rule-based approaches fall short, like type prediction. The type prediction task consists in adding a new type annotation to a partially typed program, such that the resulting program is closer to being fully typed. The intractability of rule-based approaches and high cost of manual annotation make CodeLLMs an attractive solution to the problem. However, CodeLLMs are still far from being deployed on the large-scale due to doubts surrounding their reliability.
To shed some light on how CodeLLMs approach type prediction, we investigate what happens when a model mispredicts a type. We show that by applying semantics-preserving edits to code, CodeLLMs are eventually misled into mispredicting type annotations. However, by leveraging activation steering we are able to "steer" the model back to the correct prediction, making models more robust against semantically irrelevant prompt features. We show that steering achieves comparable performance to fine-tuning directly on the type prediction task. Furthermore, we find that steering vectors computed from Python code are effective at correcting TypeScript mispredictions, and vice versa. To our knowledge, this is the first evidence of its kind to suggest that CodeLLMs learn task representations that transfer across languages.

Comments:	14 pages, 7 figures
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)
Cite as:	arXiv:2404.01903 [cs.CL]
	(or arXiv:2404.01903v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.01903

Submission history

From: Francesca Lucchetti [view email]
[v1] Tue, 2 Apr 2024 12:44:44 UTC (86 KB)
[v2] Fri, 13 Sep 2024 14:56:46 UTC (349 KB)

Computer Science > Computation and Language

Title:Understanding How CodeLLMs (Mis)Predict Types with Activation Steering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Understanding How CodeLLMs (Mis)Predict Types with Activation Steering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators