Gender bias in (non)-contextual clinical word embeddings for stereotypical medical categories

Sogancioglu, Gizem; Mijsters, Fabian; van Uden, Amar; Peperzak, Jelle

Computer Science > Computation and Language

arXiv:2208.01341 (cs)

[Submitted on 2 Aug 2022 (v1), last revised 8 Aug 2022 (this version, v2)]

Title:Gender bias in (non)-contextual clinical word embeddings for stereotypical medical categories

Authors:Gizem Sogancioglu, Fabian Mijsters, Amar van Uden, Jelle Peperzak

View PDF

Abstract:Clinical word embeddings are extensively used in various Bio-NLP problems as a state-of-the-art feature vector representation. Although they are quite successful at the semantic representation of words, due to the dataset - which potentially carries statistical and societal bias - on which they are trained, they might exhibit gender stereotypes. This study analyses gender bias of clinical embeddings on three medical categories: mental disorders, sexually transmitted diseases, and personality traits. To this extent, we analyze two different pre-trained embeddings namely (contextualized) clinical-BERT and (non-contextualized) BioWordVec. We show that both embeddings are biased towards sensitive gender groups but BioWordVec exhibits a higher bias than clinical-BERT for all three categories. Moreover, our analyses show that clinical embeddings carry a high degree of bias for some medical terms and diseases which is conflicting with medical literature. Having such an ill-founded relationship might cause harm in downstream applications that use clinical embeddings.

Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY)
Cite as:	arXiv:2208.01341 [cs.CL]
	(or arXiv:2208.01341v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2208.01341

Submission history

From: Gizem Sogancioglu [view email]
[v1] Tue, 2 Aug 2022 10:02:21 UTC (1,251 KB)
[v2] Mon, 8 Aug 2022 14:18:42 UTC (1,251 KB)

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computation and Language

Title:Gender bias in (non)-contextual clinical word embeddings for stereotypical medical categories

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Computer Science > Computation and Language

Title:Gender bias in (non)-contextual clinical word embeddings for stereotypical medical categories

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators