When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?

Iyer, Srikrishna

Computer Science > Computation and Language

arXiv:2411.16487 (cs)

[Submitted on 25 Nov 2024]

Title:When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?

Authors:Srikrishna Iyer

View PDF HTML (experimental)

Abstract:We present our submission to the BabyLM challenge, aiming to push the boundaries of data-efficient language model pretraining. Our method builds upon deep mutual learning, introducing a student model search for diverse initialization. We address the limitation of treating students equally by formulating weighted mutual learning as a bi-level optimization problem. The inner loop learns compact students through online distillation, while the outer loop optimizes weights for better knowledge distillation from diverse students. This dynamic weighting strategy eliminates the need for a teacher model, reducing computational requirements. Our evaluations show that teacher-less methods can match or surpass teacher-supervised approaches.

Comments:	Accepted to BabyLM challenge, CoNLL Workshop, EMNLP 2024
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2411.16487 [cs.CL]
	(or arXiv:2411.16487v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.16487

Submission history

From: Srikrishna Iyer [view email]
[v1] Mon, 25 Nov 2024 15:25:31 UTC (683 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2024-11

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators