Computer Science > Distributed, Parallel, and Cluster Computing
[Submitted on 10 Jul 2024 (v1), last revised 20 Jul 2024 (this version, v2)]
Title:A Transverse-Read-assisted Valid-Bit Collection to Accelerate Stochastic Computing MAC for Energy-Efficient in-RTM DNNs
View PDF HTML (experimental)Abstract:It looks attractive to coordinate racetrack-memory(RM) and stochastic-computing (SC) jointly to build an ultra-low power neuron-architecture. However, the above combination has always been questioned in a fatal weakness that the narrow bit-view of the RM-MTJ structure, a.k.a. shift-and-access pattern, cannot physically match the great throughput of direct-stored stochastic sequences. Fortunately, a recently developed Transverse-Read(TR) provides a wider segment-view to RM via detecting the resistance of domain-walls between a couple of MTJs on single nanowire, therefore RM can be enhanced with a faster access to the sequences without any substantial domain-shift. To utilize TR for a power-efficient SC-DNNs, we propose a segment-based compression to leverage one-cycle TR to only read those kernel segments of stochastic sequences, meanwhile, remove a large number of redundant segments for ultra-high storage density. In decompression stage, low-discrepancy stochastic sequences can be quickly reassembled by a select-and-output loop using kernel segments rather than slowly regenerated by costly SNGs. Since TR can provide an ideal in-memory acceleration in one-counting, counter-free SC-MACs are designed and deployed near RMs to form a power-efficient neuron-architecture, in which, the binary results of TR are activated straightforward without sluggish APCs. The results show that under the TR aided RM model, the power efficiency, speed, and stochastic accuracy of Seed-based Fast Stochastic Computing significantly enhance the performance of DNNs. The speed of computation is 2.88x faster in Lenet-5 and 4.40x faster in VGG-19 compared to the CORUSCANT. The integration of TR with RTM is deployed near the memory to create a power-efficient neuron architecture, eliminating the need for slow Accumulative Parallel Counters (APCs) and improving access speed to stochastic sequences.
Submission history
From: Zhiying Zhang [view email][v1] Wed, 10 Jul 2024 09:02:35 UTC (1,235 KB)
[v2] Sat, 20 Jul 2024 09:32:37 UTC (1,235 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.