Scanning HTML at Tens of Gigabytes per Second on ARM Processors

Lemire, Daniel

Computer Science > Data Structures and Algorithms

arXiv:2503.01662 (cs)

[Submitted on 3 Mar 2025]

Title:Scanning HTML at Tens of Gigabytes per Second on ARM Processors

Authors:Daniel Lemire

View PDF HTML (experimental)

Abstract:Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. Recent advances have leveraged SIMD instructions to accelerate parsing of common Internet formats such as JSON and base64. During HTML parsing, they quickly identify specific characters with a strategy called vectorized classification. We review their techniques and compare them with a faster alternative. We measure a 20-fold performance improvement in HTML scanning compared to traditional methods on recent ARM processors. Our findings highlight the potential of SIMD-based algorithms for optimizing Web browser performance.

Subjects:	Data Structures and Algorithms (cs.DS); Hardware Architecture (cs.AR)
Cite as:	arXiv:2503.01662 [cs.DS]
	(or arXiv:2503.01662v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2503.01662

Submission history

From: Daniel Lemire [view email]
[v1] Mon, 3 Mar 2025 15:38:20 UTC (68 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DS

< prev | next >

new | recent | 2025-03

Change to browse by:

cs
cs.AR

References & Citations

export BibTeX citation

Computer Science > Data Structures and Algorithms

Title:Scanning HTML at Tens of Gigabytes per Second on ARM Processors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Scanning HTML at Tens of Gigabytes per Second on ARM Processors

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators