MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory

Riedel, Samuel; Cavalcante, Matheus; Andri, Renzo; Benini, Luca

doi:10.1109/TC.2023.3307796

Computer Science > Hardware Architecture

arXiv:2303.17742 (cs)

[Submitted on 30 Mar 2023 (v1), last revised 28 Nov 2023 (this version, v2)]

Title:MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory

Authors:Samuel Riedel, Matheus Cavalcante, Renzo Andri, Luca Benini

View PDF

Abstract:Shared L1 memory clusters are a common architectural pattern (e.g., in GPGPUs) for building efficient and flexible multi-processing-element (PE) engines. However, it is a common belief that these tightly-coupled clusters would not scale beyond a few tens of PEs. In this work, we tackle scaling shared L1 clusters to hundreds of PEs while supporting a flexible and productive programming model and maintaining high efficiency. We present MemPool, a manycore system with 256 RV32IMAXpulpimg "Snitch" cores featuring application-tunable functional units. We designed and implemented an efficient low-latency PE to L1-memory interconnect, an optimized instruction path to ensure each PE's independent execution, and a powerful DMA engine and system interconnect to stream data in and out. MemPool is easy to program, with all the cores sharing a global view of a large, multi-banked, L1 scratchpad memory, accessible within at most five cycles in the absence of conflicts. We provide multiple runtimes to program MemPool at different abstraction levels and illustrate its versatility with a wide set of applications. MemPool runs at 600 MHz (60 gate delays) in typical conditions (TT/0.80 V/25 °C) in 22 nm FDX technology and achieves a performance of up to 229 GOPS or 180 GOPS/W with less than 2% of execution stalls.

Comments:	14 pages, 17 figures, 2 tables, Published in IEEE Transactions on Computers
Subjects:	Hardware Architecture (cs.AR)
Cite as:	arXiv:2303.17742 [cs.AR]
	(or arXiv:2303.17742v2 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2303.17742
Journal reference:	IEEE Transactions on Computers, vol. 72, no. 12, pp. 3561-3575, Dec. 2023
Related DOI:	https://doi.org/10.1109/TC.2023.3307796

Submission history

From: Samuel Riedel [view email]
[v1] Thu, 30 Mar 2023 23:30:06 UTC (10,375 KB)
[v2] Tue, 28 Nov 2023 10:16:26 UTC (8,573 KB)

Computer Science > Hardware Architecture

Title:MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators