Ontology-based Semantic Similarity Measures for Clustering Medical Concepts in Drug Safety

Painter, Jeffery L; Haguinet, François; Powell, Gregory E; Bate, Andrew

Computer Science > Computation and Language

arXiv:2503.20737 (cs)

[Submitted on 26 Mar 2025]

Title:Ontology-based Semantic Similarity Measures for Clustering Medical Concepts in Drug Safety

Authors:Jeffery L Painter, François Haguinet, Gregory E Powell, Andrew Bate

View PDF HTML (experimental)

Abstract:Semantic similarity measures (SSMs) are widely used in biomedical research but remain underutilized in pharmacovigilance. This study evaluates six ontology-based SSMs for clustering MedDRA Preferred Terms (PTs) in drug safety data. Using the Unified Medical Language System (UMLS), we assess each method's ability to group PTs around medically meaningful centroids. A high-throughput framework was developed with a Java API and Python and R interfaces support large-scale similarity computations. Results show that while path-based methods perform moderately with F1 scores of 0.36 for WUPALMER and 0.28 for LCH, intrinsic information content (IC)-based measures, especially INTRINSIC-LIN and SOKAL, consistently yield better clustering accuracy (F1 score of 0.403). Validated against expert review and standard MedDRA queries (SMQs), our findings highlight the promise of IC-based SSMs in enhancing pharmacovigilance workflows by improving early signal detection and reducing manual review.

Subjects:	Computation and Language (cs.CL)
ACM classes:	I.2.4; G.3; H.3.3
Cite as:	arXiv:2503.20737 [cs.CL]
	(or arXiv:2503.20737v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2503.20737

Submission history

From: Jeffery Painter Jr [view email]
[v1] Wed, 26 Mar 2025 17:19:00 UTC (1,403 KB)

Computer Science > Computation and Language

Title:Ontology-based Semantic Similarity Measures for Clustering Medical Concepts in Drug Safety

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Ontology-based Semantic Similarity Measures for Clustering Medical Concepts in Drug Safety

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators