Sparse Autoencoder Insights on Voice Embeddings

Pluth, Daniel; Zhou, Yu; Gurbani, Vijay K.

Computer Science > Computation and Language

arXiv:2502.00127 (cs)

[Submitted on 31 Jan 2025]

Title:Sparse Autoencoder Insights on Voice Embeddings

Authors:Daniel Pluth, Yu Zhou, Vijay K. Gurbani

View PDF HTML (experimental)

Abstract:Recent advances in explainable machine learning have highlighted the potential of sparse autoencoders in uncovering mono-semantic features in densely encoded embeddings. While most research has focused on Large Language Model (LLM) embeddings, the applicability of this technique to other domains remains largely unexplored. This study applies sparse autoencoders to speaker embeddings generated from a Titanet model, demonstrating the effectiveness of this technique in extracting mono-semantic features from non-textual embedded data. The results show that the extracted features exhibit characteristics similar to those found in LLM embeddings, including feature splitting and steering. The analysis reveals that the autoencoder can identify and manipulate features such as language and music, which are not evident in the original embedding. The findings suggest that sparse autoencoders can be a valuable tool for understanding and interpreting embedded data in many domains, including audio-based speaker recognition.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.00127 [cs.CL]
	(or arXiv:2502.00127v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.00127

Submission history

From: Vijay K Gurbani [view email]
[v1] Fri, 31 Jan 2025 19:21:43 UTC (689 KB)

Computer Science > Computation and Language

Title:Sparse Autoencoder Insights on Voice Embeddings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Sparse Autoencoder Insights on Voice Embeddings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators