HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

Wu, Taiqiang; Ding, Chenchen; Zhou, Wenyong; Cheng, Yuxin; Feng, Xincheng; Wang, Shuqi; Shi, Chufan; Liu, Zhengwu; Wong, Ngai

Computer Science > Computation and Language

arXiv:2502.19747 (cs)

[Submitted on 27 Feb 2025]

Title:HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

Authors:Taiqiang Wu, Chenchen Ding, Wenyong Zhou, Yuxin Cheng, Xincheng Feng, Shuqi Wang, Chufan Shi, Zhengwu Liu, Ngai Wong

View PDF HTML (experimental)

Abstract:Low-rank adaptation (LoRA) is a predominant parameter-efficient finetuning method to adapt large language models (LLMs) for downstream tasks. In this paper, we first propose to deploy the LoRA-finetuned LLMs on the hybrid compute-in-memory (CIM) architecture (i.e., pretrained weights onto RRAM and LoRA onto SRAM). To address performance degradation from RRAM's inherent noise, we design a novel Hardware-aware Low-rank Adaption (HaLoRA) method, aiming to train a LoRA branch that is both robust and accurate by aligning the training objectives under both ideal and noisy conditions. Experiments finetuning LLaMA 3.2 1B and 3B demonstrate HaLoRA's effectiveness across multiple reasoning tasks, achieving up to 22.7 improvement in average score while maintaining robustness at various noise levels.

Comments:	7 pages
Subjects:	Computation and Language (cs.CL); Hardware Architecture (cs.AR)
Cite as:	arXiv:2502.19747 [cs.CL]
	(or arXiv:2502.19747v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.19747

Submission history

From: Taiqiang Wu [view email]
[v1] Thu, 27 Feb 2025 04:20:47 UTC (309 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2025-02

Change to browse by:

cs
cs.AR

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators