A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?

Alabdulmohsin, Ibrahim; Steiner, Andreas

Computer Science > Computation and Language

arXiv:2502.14924 (cs)

[Submitted on 19 Feb 2025]

Title:A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?

Authors:Ibrahim Alabdulmohsin, Andreas Steiner

View PDF HTML (experimental)

Abstract:Language exhibits a fractal structure in its information-theoretic complexity (i.e. bits per token), with self-similarity across scales and long-range dependence (LRD). In this work, we investigate whether large language models (LLMs) can replicate such fractal characteristics and identify conditions-such as temperature setting and prompting method-under which they may fail. Moreover, we find that the fractal parameters observed in natural language are contained within a narrow range, whereas those of LLMs' output vary widely, suggesting that fractal parameters might prove helpful in detecting a non-trivial portion of LLM-generated texts. Notably, these findings, and many others reported in this work, are robust to the choice of the architecture; e.g. Gemini 1.0 Pro, Mistral-7B and Gemma-2B. We also release a dataset comprising of over 240,000 articles generated by various LLMs (both pretrained and instruction-tuned) with different decoding temperatures and prompting methods, along with their corresponding human-generated texts. We hope that this work highlights the complex interplay between fractal properties, prompting, and statistical mimicry in LLMs, offering insights for generating, evaluating and detecting synthetic texts.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.14924 [cs.CL]
	(or arXiv:2502.14924v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.14924

Submission history

From: Ibrahim Alabdulmohsin [view email]
[v1] Wed, 19 Feb 2025 18:15:57 UTC (4,507 KB)

Computer Science > Computation and Language

Title:A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators