An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

Miao, Yukai; Bai, Yu; Chen, Li; Li, Dan; Sun, Haifeng; Wang, Xizheng; Luo, Ziqiu; Sun, Dapeng; Xu, Xiuting; Zhang, Qi; Xiang, Chao; Li, Xinchi

Computer Science > Computation and Language

arXiv:2309.05557v2 (cs)

[Submitted on 11 Sep 2023 (v1), revised 12 Sep 2023 (this version, v2), latest version 19 Sep 2023 (v3)]

Title:An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

Authors:Yukai Miao, Yu Bai, Li Chen, Dan Li, Haifeng Sun, Xizheng Wang, Ziqiu Luo, Dapeng Sun, Xiuting Xu, Qi Zhang, Chao Xiang, Xinchi Li

View PDF

Abstract:Large language models (LLMs) can respond to human language queries and have shown powerful potential applications in network operations (NetOps). Thanks to the large amount of commonsense knowledge inherent, LLMs achieve much better inference accuracy than traditional models and emerge with strong abilities in generalization, reasoning, and code generation. These abilities may have a crucial boost to automated and intelligent NetOps. However, it remains under-explored how well LLMs perform in various NetOps tasks. In this work, we make a systematic assessment of the capabilities, strengths, and limitations of selected LLMs in the field of NetOps. The evaluation is conducted on a collection of 5,732 questions about NetOps, encompassing 26 publicly available general-domain LLMs, including ChatGPT, LLaMA, Falcon, etc. We also finetune some of these LLMs with our collected NetOps corpus and evaluate the resulting models. The evaluation method follows the widely adopted benchmarks for general-domain LLMs, combined with Chain-of-Thought Prompts and Retrieval-Augmented Generation. The results show that only GPT-4 achieves high accuracy equivalent to passing the NetOps certification exam for humans, while all the other LLMs have much lower accuracy. However, some open models like LLaMA 2 still demonstrate significant potential. Furthermore, we evaluate the impact of factors such as model parameters, prompt engineering, instruction fine-tuning etc. This work shall be treated as the initial effort to systematic evaluation of LLMs in NetOps, and a more rigorous study is required for production use. The evaluation code and dataset will be released to benefit future research.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Cite as:	arXiv:2309.05557 [cs.CL]
	(or arXiv:2309.05557v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.05557

Submission history

From: Yukai Miao [view email]
[v1] Mon, 11 Sep 2023 15:45:40 UTC (249 KB)
[v2] Tue, 12 Sep 2023 12:15:38 UTC (257 KB)
[v3] Tue, 19 Sep 2023 16:04:25 UTC (252 KB)

Computer Science > Computation and Language

Title:An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators