Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Cvitkovic, Milan; Singh, Badal; Anandkumar, Anima

Computer Science > Machine Learning

arXiv:1810.08305 (cs)

[Submitted on 18 Oct 2018 (v1), last revised 19 May 2019 (this version, v2)]

Title:Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Authors:Milan Cvitkovic, Badal Singh, Anima Anandkumar

View PDF

Abstract:Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code. We find that combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models' performance on a code completion task and a variable naming task --- with over $100\%$ relative improvement on the latter --- at the cost of a moderate increase in computation time.

Comments:	Published in the International Conference on Machine Learning (ICML 2019), 13 pages
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1810.08305 [cs.LG]
	(or arXiv:1810.08305v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1810.08305

Submission history

From: Milan Cvitkovic [view email]
[v1] Thu, 18 Oct 2018 23:33:11 UTC (516 KB)
[v2] Sun, 19 May 2019 22:44:00 UTC (523 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2018-10

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Milan Cvitkovic
Badal Singh
Anima Anandkumar

export BibTeX citation

Computer Science > Machine Learning

Title:Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators