SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information

Zhang, Xiangyu; Liu, Hexin; Zhang, Qiquan; Ahmed, Beena; Epps, Julien

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2502.10950 (eess)

[Submitted on 16 Feb 2025]

Title:SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information

Authors:Xiangyu Zhang, Hexin Liu, Qiquan Zhang, Beena Ahmed, Julien Epps

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have been increasingly adopted for health-related tasks, yet their performance in depression detection remains limited when relying solely on text input. While Retrieval-Augmented Generation (RAG) typically enhances LLM capabilities, our experiments indicate that traditional text-based RAG systems struggle to significantly improve depression detection accuracy. This challenge stems partly from the rich depression-relevant information encoded in acoustic speech patterns information that current text-only approaches fail to capture effectively. To address this limitation, we conduct a systematic analysis of temporal speech patterns, comparing healthy individuals with those experiencing depression. Based on our findings, we introduce Speech Timing-based Retrieval-Augmented Generation, SpeechT-RAG, a novel system that leverages speech timing features for both accurate depression detection and reliable confidence estimation. This integrated approach not only outperforms traditional text-based RAG systems in detection accuracy but also enhances uncertainty quantification through a confidence scoring mechanism that naturally extends from the same temporal features. Our unified framework achieves comparable results to fine-tuned LLMs without additional training while simultaneously addressing the fundamental requirements for both accuracy and trustworthiness in mental health assessment.

Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2502.10950 [eess.AS]
	(or arXiv:2502.10950v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2502.10950

Submission history

From: Xiangyu Zhang [view email]
[v1] Sun, 16 Feb 2025 02:02:19 UTC (579 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators