Scalable Language Models with Posterior Inference of Latent Thought Vectors

Kong, Deqian; Zhao, Minglu; Xu, Dehong; Pang, Bo; Wang, Shu; Honig, Edouardo; Si, Zhangzhang; Li, Chuan; Xie, Jianwen; Xie, Sirui; Wu, Ying Nian

Computer Science > Computation and Language

arXiv:2502.01567 (cs)

[Submitted on 3 Feb 2025]

Title:Scalable Language Models with Posterior Inference of Latent Thought Vectors

Authors:Deqian Kong, Minglu Zhao, Dehong Xu, Bo Pang, Shu Wang, Edouardo Honig, Zhangzhang Si, Chuan Li, Jianwen Xie, Sirui Xie, Ying Nian Wu

View PDF HTML (experimental)

Abstract:We propose a novel family of language models, Latent-Thought Language Models (LTMs), which incorporate explicit latent thought vectors that follow an explicit prior model in latent space. These latent thought vectors guide the autoregressive generation of ground tokens through a Transformer decoder. Training employs a dual-rate optimization process within the classical variational Bayes framework: fast learning of local variational parameters for the posterior distribution of latent vectors, and slow learning of global decoder parameters. Empirical studies reveal that LTMs possess additional scaling dimensions beyond traditional LLMs, yielding a structured design space. Higher sample efficiency can be achieved by increasing training compute per token, with further gains possible by trading model size for more inference steps. Designed based on these scaling properties, LTMs demonstrate superior sample and parameter efficiency compared to conventional autoregressive models and discrete diffusion models. They significantly outperform these counterparts in validation perplexity and zero-shot language modeling. Additionally, LTMs exhibit emergent few-shot in-context reasoning capabilities that scale with model and latent size, and achieve competitive performance in conditional and unconditional text generation.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2502.01567 [cs.CL]
	(or arXiv:2502.01567v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.01567

Submission history

From: Kong Deqian [view email]
[v1] Mon, 3 Feb 2025 17:50:34 UTC (2,393 KB)

Computer Science > Computation and Language

Title:Scalable Language Models with Posterior Inference of Latent Thought Vectors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Scalable Language Models with Posterior Inference of Latent Thought Vectors

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators