Graph-Convolutional Networks: Named Entity Recognition and Large Language Model Embedding in Document Clustering

Keraghel, Imed; Nadif, Mohamed

Computer Science > Computation and Language

arXiv:2412.14867 (cs)

[Submitted on 19 Dec 2024]

Title:Graph-Convolutional Networks: Named Entity Recognition and Large Language Model Embedding in Document Clustering

Authors:Imed Keraghel, Mohamed Nadif

View PDF HTML (experimental)

Abstract:Recent advances in machine learning, particularly Large Language Models (LLMs) such as BERT and GPT, provide rich contextual embeddings that improve text representation. However, current document clustering approaches often ignore the deeper relationships between named entities (NEs) and the potential of LLM embeddings. This paper proposes a novel approach that integrates Named Entity Recognition (NER) and LLM embeddings within a graph-based framework for document clustering. The method builds a graph with nodes representing documents and edges weighted by named entity similarity, optimized using a graph-convolutional network (GCN). This ensures a more effective grouping of semantically related documents. Experimental results indicate that our approach outperforms conventional co-occurrence-based methods in clustering, notably for documents rich in named entities.

Comments:	11 pages, 4 figures
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.14867 [cs.CL]
	(or arXiv:2412.14867v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.14867

Submission history

From: Imed Keraghel [view email]
[v1] Thu, 19 Dec 2024 14:03:22 UTC (4,102 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2024-12

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Graph-Convolutional Networks: Named Entity Recognition and Large Language Model Embedding in Document Clustering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Graph-Convolutional Networks: Named Entity Recognition and Large Language Model Embedding in Document Clustering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators