AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Alghamdi, Asaad; Duan, Xinyu; Jiang, Wei; Wang, Zhenhai; Wu, Yimeng; Xia, Qingrong; Wang, Zhefeng; Zheng, Yi; Rezagholizadeh, Mehdi; Huai, Baoxing; Cheng, Peilun; Ghaddar, Abbas

Computer Science > Computation and Language

arXiv:2306.06800 (cs)

[Submitted on 11 Jun 2023]

Title:AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Authors:Asaad Alghamdi, Xinyu Duan, Wei Jiang, Zhenhai Wang, Yimeng Wu, Qingrong Xia, Zhefeng Wang, Yi Zheng, Mehdi Rezagholizadeh, Baoxing Huai, Peilun Cheng, Abbas Ghaddar

View PDF

Abstract:Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP). In this work, we present AraMUS, the largest Arabic PLM with 11B parameters trained on 529GB of high-quality Arabic textual data. AraMUS achieves state-of-the-art performances on a diverse set of Arabic classification and generative tasks. Moreover, AraMUS shows impressive few-shot learning abilities compared with the best existing Arabic PLMs.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2306.06800 [cs.CL]
	(or arXiv:2306.06800v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2306.06800

Submission history

From: Asaad Alghamdi [view email]
[v1] Sun, 11 Jun 2023 22:55:18 UTC (314 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2023-06

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators