Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics

Peng, Haosong; Feng, Wei; Li, Hao; Zhan, Yufeng; Jin, Ren; Xia, Yuanqing

Computer Science > Multimedia

arXiv:2404.09245 (cs)

[Submitted on 14 Apr 2024 (v1), last revised 26 Sep 2024 (this version, v2)]

Title:Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics

Authors:Haosong Peng, Wei Feng, Hao Li, Yufeng Zhan, Ren Jin, Yuanqing Xia

View PDF HTML (experimental)

Abstract:The advent of edge computing has made real-time intelligent video analytics feasible. Previous works, based on traditional model architecture (e.g., CNN, RNN, etc.), employ various strategies to filter out non-region-of-interest content to minimize bandwidth and computation consumption but show inferior performance in adverse environments. Recently, visual foundation models based on transformers have shown great performance in adverse environments due to their amazing generalization capability. However, they require a large amount of computation power, which limits their applications in real-time intelligent video analytics. In this paper, we find visual foundation models like Vision Transformer (ViT) also have a dedicated acceleration mechanism for video analytics. To this end, we introduce Arena, an end-to-end edge-assisted video inference acceleration system based on ViT. We leverage the capability of ViT that can be accelerated through token pruning by only offloading and feeding Patches-of-Interest to the downstream models. Additionally, we design an adaptive keyframe inference switching algorithm tailored to different videos, capable of adapting to the current video content to jointly optimize accuracy and bandwidth. Through extensive experiments, our findings reveal that Arena can boost inference speeds by up to 1.58\(\times\) and 1.82\(\times\) on average while consuming only 47\% and 31\% of the bandwidth, respectively, all with high inference accuracy.

Subjects:	Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.09245 [cs.MM]
	(or arXiv:2404.09245v2 [cs.MM] for this version)
	https://doi.org/10.48550/arXiv.2404.09245

Submission history

From: Haosong Peng [view email]
[v1] Sun, 14 Apr 2024 13:14:13 UTC (30,401 KB)
[v2] Thu, 26 Sep 2024 01:25:22 UTC (26,389 KB)

Computer Science > Multimedia

Title:Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multimedia

Title:Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators