ProtoBERT-LoRA: Parameter-Efficient Prototypical Finetuning for Immunotherapy Study Identification

Zhang, Shijia; Ding, Xiyu; Ding, Kai; Zhang, Jacob; Galinsky, Kevin; Wang, Mengrui; Mayers, Ryan P.; Wang, Zheyu; Kharrazi, Hadi

Computer Science > Computation and Language

arXiv:2503.20179 (cs)

[Submitted on 26 Mar 2025]

Title:ProtoBERT-LoRA: Parameter-Efficient Prototypical Finetuning for Immunotherapy Study Identification

Authors:Shijia Zhang, Xiyu Ding, Kai Ding, Jacob Zhang, Kevin Galinsky, Mengrui Wang, Ryan P. Mayers, Zheyu Wang, Hadi Kharrazi

View PDF HTML (experimental)

Abstract:Identifying immune checkpoint inhibitor (ICI) studies in genomic repositories like Gene Expression Omnibus (GEO) is vital for cancer research yet remains challenging due to semantic ambiguity, extreme class imbalance, and limited labeled data in low-resource settings. We present ProtoBERT-LoRA, a hybrid framework that combines PubMedBERT with prototypical networks and Low-Rank Adaptation (LoRA) for efficient fine-tuning. The model enforces class-separable embeddings via episodic prototype training while preserving biomedical domain knowledge. Our dataset was divided as: Training (20 positive, 20 negative), Prototype Set (10 positive, 10 negative), Validation (20 positive, 200 negative), and Test (71 positive, 765 negative). Evaluated on test dataset, ProtoBERT-LoRA achieved F1-score of 0.624 (precision: 0.481, recall: 0.887), outperforming the rule-based system, machine learning baselines and finetuned PubMedBERT. Application to 44,287 unlabeled studies reduced manual review efforts by 82%. Ablation studies confirmed that combining prototypes with LoRA improved performance by 29% over stand-alone LoRA.

Comments:	Submitted to AMIA 2025 Annual Symposium
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2503.20179 [cs.CL]
	(or arXiv:2503.20179v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2503.20179

Submission history

From: Xiyu Ding [view email]
[v1] Wed, 26 Mar 2025 03:09:11 UTC (3,541 KB)

Computer Science > Computation and Language

Title:ProtoBERT-LoRA: Parameter-Efficient Prototypical Finetuning for Immunotherapy Study Identification

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ProtoBERT-LoRA: Parameter-Efficient Prototypical Finetuning for Immunotherapy Study Identification

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators