CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

De Leon, Frances Adriana Laureano; Guéniat, Florimond; Madabushi, Harish Tayyar

Computer Science > Computation and Language

arXiv:2006.04597 (cs)

[Submitted on 8 Jun 2020 (v1), last revised 7 Sep 2020 (this version, v2)]

Title:CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

Authors:Frances Adriana Laureano De Leon, Florimond Guéniat, Harish Tayyar Madabushi

View PDF

Abstract:The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching. While recent research into code-switched posts has focused on the use of multilingual word embeddings, these embeddings were not trained on code-switched data. In this work, we present word-embeddings trained on code-switched tweets, specifically those that make use of Spanish and English, known as Spanglish. We explore the embedding space to discover how they capture the meanings of words in both languages. We test the effectiveness of these embeddings by participating in SemEval 2020 Task 9: ~\emph{Sentiment Analysis on Code-Mixed Social Media Text}. We utilised them to train a sentiment classifier that achieves an F-1 score of 0.722. This is higher than the baseline for the competition of 0.656, with our team (codalab username \emph{francesita}) ranking 14 out of 29 participating teams, beating the baseline.

Comments:	Accepted at SemEval-2020, COLING
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2006.04597 [cs.CL]
	(or arXiv:2006.04597v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2006.04597

Submission history

From: Frances Laureano De Leon [view email]
[v1] Mon, 8 Jun 2020 13:48:17 UTC (26 KB)
[v2] Mon, 7 Sep 2020 10:39:45 UTC (26 KB)

Computer Science > Computation and Language

Title:CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators