A Comprehensive Framework for Semantic Similarity Analysis of Human and AI-Generated Text Using Transformer Architectures and Ensemble Techniques

Gao, Lifu; Liu, Ziwei; Zhang, Qi

Computer Science > Computation and Language

arXiv:2501.14288 (cs)

[Submitted on 24 Jan 2025 (v1), last revised 31 Jan 2025 (this version, v2)]

Title:A Comprehensive Framework for Semantic Similarity Analysis of Human and AI-Generated Text Using Transformer Architectures and Ensemble Techniques

Authors:Lifu Gao, Ziwei Liu, Qi Zhang

View PDF HTML (experimental)

Abstract:The rapid advancement of large language models (LLMs) has made detecting AI-generated text an increasingly critical challenge. Traditional methods often fail to capture the nuanced semantic differences between human and machine-generated content. We therefore propose a novel approach based on semantic similarity analysis, leveraging a multi-layered architecture that combines a pre-trained DeBERTa-v3-large model, Bi-directional LSTMs, and linear attention pooling to capture both local and global semantic patterns. To enhance performance, we employ advanced input and output augmentation techniques such as sector-level context integration and wide output configurations. These techniques enable the model to learn more discriminative features and generalize across diverse domains. Experimental results show that this approach works better than traditional methods, proving its usefulness for AI-generated text detection and other text comparison tasks.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2501.14288 [cs.CL]
	(or arXiv:2501.14288v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.14288

Submission history

From: Ziwei Liu [view email]
[v1] Fri, 24 Jan 2025 07:07:37 UTC (737 KB)
[v2] Fri, 31 Jan 2025 02:10:36 UTC (737 KB)

Computer Science > Computation and Language

Title:A Comprehensive Framework for Semantic Similarity Analysis of Human and AI-Generated Text Using Transformer Architectures and Ensemble Techniques

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Comprehensive Framework for Semantic Similarity Analysis of Human and AI-Generated Text Using Transformer Architectures and Ensemble Techniques

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators