MLOS: An Infrastructure for Automated Software Performance Engineering

Curino, Carlo; Godwal, Neha; Kroth, Brian; Kuryata, Sergiy; Lapinski, Greg; Liu, Siqi; Oks, Slava; Poppe, Olga; Smiechowski, Adam; Thayer, Ed; Weimer, Markus; Zhu, Yiwen

doi:10.1145/3399579.3399927

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2006.02155 (cs)

[Submitted on 1 Jun 2020 (v1), last revised 4 Jun 2020 (this version, v2)]

Title:MLOS: An Infrastructure for Automated Software Performance Engineering

Authors:Carlo Curino, Neha Godwal, Brian Kroth, Sergiy Kuryata, Greg Lapinski, Siqi Liu, Slava Oks, Olga Poppe, Adam Smiechowski, Ed Thayer, Markus Weimer, Yiwen Zhu

View PDF

Abstract:Developing modern systems software is a complex task that combines business logic programming and Software Performance Engineering (SPE). The later is an experimental and labor-intensive activity focused on optimizing the system for a given hardware, software, and workload (hw/sw/wl) context.
Today's SPE is performed during build/release phases by specialized teams, and cursed by: 1) lack of standardized and automated tools, 2) significant repeated work as hw/sw/wl context changes, 3) fragility induced by a "one-size-fit-all" tuning (where improvements on one workload or component may impact others). The net result: despite costly investments, system software is often outside its optimal operating point - anecdotally leaving 30% to 40% of performance on the table.
The recent developments in Data Science (DS) hints at an opportunity: combining DS tooling and methodologies with a new developer experience to transform the practice of SPE. In this paper we present: MLOS, an ML-powered infrastructure and methodology to democratize and automate Software Performance Engineering. MLOS enables continuous, instance-level, robust, and trackable systems optimization. MLOS is being developed and employed within Microsoft to optimize SQL Server performance. Early results indicated that component-level optimizations can lead to 20%-90% improvements when custom-tuning for a specific hw/sw/wl, hinting at a significant opportunity. However, several research challenges remain that will require community involvement. To this end, we are in the process of open-sourcing the MLOS core infrastructure, and we are engaging with academic institutions to create an educational program around Software 2.0 and MLOS ideas.

Comments:	4 pages, DEEM 2020
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Machine Learning (cs.LG); Performance (cs.PF); Software Engineering (cs.SE)
Cite as:	arXiv:2006.02155 [cs.DC]
	(or arXiv:2006.02155v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2006.02155
Related DOI:	https://doi.org/10.1145/3399579.3399927

Submission history

From: Brian Kroth [view email]
[v1] Mon, 1 Jun 2020 22:38:30 UTC (3,432 KB)
[v2] Thu, 4 Jun 2020 11:10:53 UTC (3,433 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:MLOS: An Infrastructure for Automated Software Performance Engineering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:MLOS: An Infrastructure for Automated Software Performance Engineering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators