AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection

Luo, Weidi; Dai, Shenghong; Liu, Xiaogeng; Banerjee, Suman; Sun, Huan; Chen, Muhao; Xiao, Chaowei

Computer Science > Artificial Intelligence

arXiv:2502.11448 (cs)

[Submitted on 17 Feb 2025 (v1), last revised 18 Feb 2025 (this version, v2)]

Title:AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection

Authors:Weidi Luo, Shenghong Dai, Xiaogeng Liu, Suman Banerjee, Huan Sun, Muhao Chen, Chaowei Xiao

View PDF HTML (experimental)

Abstract:The rapid advancements in Large Language Models (LLMs) have enabled their deployment as autonomous agents for handling complex tasks in dynamic environments. These LLMs demonstrate strong problem-solving capabilities and adaptability to multifaceted scenarios. However, their use as agents also introduces significant risks, including task-specific risks, which are identified by the agent administrator based on the specific task requirements and constraints, and systemic risks, which stem from vulnerabilities in their design or interactions, potentially compromising confidentiality, integrity, or availability (CIA) of information and triggering security risks. Existing defense agencies fail to adaptively and effectively mitigate these risks. In this paper, we propose AGrail, a lifelong agent guardrail to enhance LLM agent safety, which features adaptive safety check generation, effective safety check optimization, and tool compatibility and flexibility. Extensive experiments demonstrate that AGrail not only achieves strong performance against task-specific and system risks but also exhibits transferability across different LLM agents' tasks.

Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.11448 [cs.AI]
	(or arXiv:2502.11448v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2502.11448

Submission history

From: Weidi Luo [view email]
[v1] Mon, 17 Feb 2025 05:12:33 UTC (14,214 KB)
[v2] Tue, 18 Feb 2025 05:37:44 UTC (7,089 KB)

Computer Science > Artificial Intelligence

Title:AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators