From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Rakotonirina, Nathanaël Carraz; Hamdy, Mohammed; Campos, Jon Ander; Weber, Lucas; Testoni, Alberto; Fadaee, Marzieh; Pezzelle, Sandro; Del Tredici, Marco

Computer Science > Computation and Language

arXiv:2502.13791 (cs)

[Submitted on 19 Feb 2025]

Title:From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Authors:Nathanaël Carraz Rakotonirina, Mohammed Hamdy, Jon Ander Campos, Lucas Weber, Alberto Testoni, Marzieh Fadaee, Sandro Pezzelle, Marco Del Tredici

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) are increasingly used in working environments for a wide range of tasks, excelling at solving individual problems in isolation. However, are they also able to effectively collaborate over long-term interactions? To investigate this, we introduce MemoryCode, a synthetic multi-session dataset designed to test LLMs' ability to track and execute simple coding instructions amid irrelevant information, simulating a realistic setting. While all the models we tested handle isolated instructions well, even the performance of state-of-the-art models like GPT-4o deteriorates when instructions are spread across sessions. Our analysis suggests this is due to their failure to retrieve and integrate information over long instruction chains. Our results highlight a fundamental limitation of current LLMs, restricting their ability to collaborate effectively in long interactions.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.13791 [cs.CL]
	(or arXiv:2502.13791v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.13791

Submission history

From: Nathanaël Carraz Rakotonirina [view email]
[v1] Wed, 19 Feb 2025 14:58:04 UTC (2,306 KB)

Computer Science > Computation and Language

Title:From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators