Comparing Performance of Different Linguistically-Backed Word Embeddings for Cyberbullying Detection

Eronen, Juuso; Ptaszynski, Michal; Masui, Fumito

doi:10.19000/0002000095

Computer Science > Computation and Language

arXiv:2206.01950 (cs)

[Submitted on 4 Jun 2022]

Title:Comparing Performance of Different Linguistically-Backed Word Embeddings for Cyberbullying Detection

Authors:Juuso Eronen, Michal Ptaszynski, Fumito Masui

View PDF

Abstract:In most cases, word embeddings are learned only from raw tokens or in some cases, lemmas. This includes pre-trained language models like BERT. To investigate on the potential of capturing deeper relations between lexical items and structures and to filter out redundant information, we propose to preserve the morphological, syntactic and other types of linguistic information by combining them with the raw tokens or lemmas. This means, for example, including parts-of-speech or dependency information within the used lexical features. The word embeddings can then be trained on the combinations instead of just raw tokens. It is also possible to later apply this method to the pre-training of huge language models and possibly enhance their performance. This would aid in tackling problems which are more sophisticated from the point of view of linguistic representation, such as detection of cyberbullying.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2206.01950 [cs.CL]
	(or arXiv:2206.01950v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2206.01950
Journal reference:	Proceedings of the 2021 International Workshop on Modern Science and Technology, September 29, 2021
Related DOI:	https://doi.org/10.19000/0002000095

Submission history

From: Juuso Eronen [view email]
[v1] Sat, 4 Jun 2022 09:11:41 UTC (76 KB)

Computer Science > Computation and Language

Title:Comparing Performance of Different Linguistically-Backed Word Embeddings for Cyberbullying Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Comparing Performance of Different Linguistically-Backed Word Embeddings for Cyberbullying Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators