Context Aware Lemmatization and Morphological Tagging Method in Turkish

Sayallar, Cagri

Computer Science > Computation and Language

arXiv:2501.02361 (cs)

[Submitted on 4 Jan 2025]

Title:Context Aware Lemmatization and Morphological Tagging Method in Turkish

Authors:Cagri Sayallar

View PDF HTML (experimental)

Abstract:The smallest part of a word that defines the word is called a word root. Word roots are used to increase success in many applications since they simplify the word. In this study, the lemmatization model, which is a word root finding method, and the morphological tagging model, which predicts the grammatical knowledge of the word, are presented. The presented model was developed for Turkish, and both models make predictions by taking the meaning of the word into account. In the literature, there is no lemmatization study that is sensitive to word meaning in Turkish. For this reason, the present study shares the model and the results obtained from the model on Turkish lemmatization for the first time in the literature. In the present study, in the lemmatization and morphological tagging models, bidirectional LSTM is used for the spelling of words, and the Turkish BERT model is used for the meaning of words. The models are trained using the IMST and PUD datasets from Universal Dependencies. The results from the training of the models were compared with the results from the SIGMORPHON 2019 competition. The results of the comparisons revealed that our models were superior.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
MSC classes:	68T07, 68T50
ACM classes:	I.2.7
Cite as:	arXiv:2501.02361 [cs.CL]
	(or arXiv:2501.02361v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.02361

Submission history

From: Çağrı Sayallar [view email]
[v1] Sat, 4 Jan 2025 19:12:43 UTC (258 KB)

Computer Science > Computation and Language

Title:Context Aware Lemmatization and Morphological Tagging Method in Turkish

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Context Aware Lemmatization and Morphological Tagging Method in Turkish

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators