Learning Code-Edit Embedding to Model Student Debugging Behavior

Heickal, Hasnain; Lan, Andrew

Abstract:Providing effective feedback for programming assignments in computer science education can be challenging: students solve problems by iteratively submitting code, executing it, and using limited feedback from the compiler or the auto-grader to debug. Analyzing student debugging behavior in this process may reveal important insights into their knowledge and inform better personalized support tools. In this work, we propose an encoder-decoder-based model that learns meaningful code-edit embeddings between consecutive student code submissions, to capture their debugging behavior. Our model leverages information on whether a student code submission passes each test case to fine-tune large language models (LLMs) to learn code editing representations. It enables personalized next-step code suggestions that maintain the student's coding style while improving test case correctness. Our model also enables us to analyze student code-editing patterns to uncover common student errors and debugging behaviors, using clustering techniques. Experimental results on a real-world student code submission dataset demonstrate that our model excels at code reconstruction and personalized code suggestion while revealing interesting patterns in student debugging behavior.

Subjects:	Software Engineering (cs.SE); Computation and Language (cs.CL)
Cite as:	arXiv:2502.19407 [cs.SE]
	(or arXiv:2502.19407v1 [cs.SE] for this version)
	https://doi.org/10.48550/arXiv.2502.19407

Computer Science > Software Engineering

Title:Learning Code-Edit Embedding to Model Student Debugging Behavior

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators