Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices

Gan, Yu; Liang, Mingyu; Dev, Sundar; Lo, David; Delimitrou, Christina

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2101.00267 (cs)

[Submitted on 1 Jan 2021]

Title:Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices

Authors:Yu Gan, Mingyu Liang, Sundar Dev, David Lo, Christina Delimitrou

View PDF

Abstract:Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite the advantages of modularity and elasticity microservices offer, they also complicate cluster management and performance debugging, as dependencies between tiers introduce backpressure and cascading QoS violations.
We present Sage, a machine learning-driven root cause analysis system for interactive cloud microservices. Sage leverages unsupervised ML models to circumvent the overhead of trace labeling, captures the impact of dependencies between microservices to determine the root cause of unpredictable performance online, and applies corrective actions to recover a cloud service's QoS. In experiments on both dedicated local clusters and large clusters on Google Compute Engine we show that Sage consistently achieves over 93% accuracy in correctly identifying the root cause of QoS violations, and improves performance predictability.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Cite as:	arXiv:2101.00267 [cs.DC]
	(or arXiv:2101.00267v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2101.00267

Submission history

From: Christina Delimitrou [view email]
[v1] Fri, 1 Jan 2021 16:44:37 UTC (1,756 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DC

< prev | next >

new | recent | 2021-01

Change to browse by:

cs
cs.PF

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yu Gan
David Lo
Christina Delimitrou

export BibTeX citation

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators