SynCode: LLM Generation with Grammar Augmentation

Ugare, Shubham; Suresh, Tarun; Kang, Hangoo; Misailovic, Sasa; Singh, Gagandeep

Computer Science > Machine Learning

arXiv:2403.01632 (cs)

[Submitted on 3 Mar 2024 (v1), last revised 14 Jul 2024 (this version, v3)]

Title:SynCode: LLM Generation with Grammar Augmentation

Authors:Shubham Ugare, Tarun Suresh, Hangoo Kang, Sasa Misailovic, Gagandeep Singh

View PDF

Abstract:LLMs are widely used in complex AI applications. These applications underscore the need for LLM outputs to adhere to a specific format, for their integration with other components in the systems. Typically the format rules e.g., for data serialization formats such as JSON, YAML, or Code in Programming Language are expressed as context-free grammar (CFG). Due to the hallucinations and unreliability of LLMs, instructing LLMs to adhere to specified syntax becomes an increasingly important challenge.
We present SynCode, a novel framework for efficient and general syntactical decoding with LLMs, to address this challenge. SynCode ensures soundness and completeness with respect to the CFG of a formal language, effectively retaining valid tokens while filtering out invalid ones. SynCode uses an offline-constructed, efficient lookup table, the DFA mask store, derived from the DFA of the language's grammar for efficient generation. SynCode seamlessly integrates with any language defined by CFG, as evidenced by experiments focusing on generating JSON, Python, and Go outputs. Our experiments evaluating the effectiveness of SynCode for JSON generation demonstrate that SynCode eliminates all syntax errors and significantly outperforms state-of-the-art baselines. Furthermore, our results underscore how SynCode significantly reduces 96.07% of syntax errors in generated Python and Go code, showcasing its substantial impact on enhancing syntactical precision in LLM generation. Our code is available at this https URL

Subjects:	Machine Learning (cs.LG); Formal Languages and Automata Theory (cs.FL); Programming Languages (cs.PL); Software Engineering (cs.SE)
Cite as:	arXiv:2403.01632 [cs.LG]
	(or arXiv:2403.01632v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.01632

Submission history

From: Shubham Ugare [view email]
[v1] Sun, 3 Mar 2024 22:38:35 UTC (2,069 KB)
[v2] Mon, 29 Apr 2024 04:05:54 UTC (4,751 KB)
[v3] Sun, 14 Jul 2024 22:22:59 UTC (4,760 KB)

Computer Science > Machine Learning

Title:SynCode: LLM Generation with Grammar Augmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:SynCode: LLM Generation with Grammar Augmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators