AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration

Chen, Jizhou; Cong, Samuel Lee

Computer Science > Cryptography and Security

arXiv:2502.09809 (cs)

[Submitted on 13 Feb 2025]

Title:AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration

Authors:Jizhou Chen, Samuel Lee Cong

View PDF HTML (experimental)

Abstract:The integration of tool use into large language models (LLMs) enables agentic systems with real-world impact. In the meantime, unlike standalone LLMs, compromised agents can execute malicious workflows with more consequential impact, signified by their tool-use capability. We propose AgentGuard, a framework to autonomously discover and validate unsafe tool-use workflows, followed by generating safety constraints to confine the behaviors of agents, achieving the baseline of safety guarantee at deployment. AgentGuard leverages the LLM orchestrator's innate capabilities - knowledge of tool functionalities, scalable and realistic workflow generation, and tool execution privileges - to act as its own safety evaluator. The framework operates through four phases: identifying unsafe workflows, validating them in real-world execution, generating safety constraints, and validating constraint efficacy. The output, an evaluation report with unsafe workflows, test cases, and validated constraints, enables multiple security applications. We empirically demonstrate AgentGuard's feasibility with experiments. With this exploratory work, we hope to inspire the establishment of standardized testing and hardening procedures for LLM agents to enhance their trustworthiness in real-world applications.

Comments:	Project report of AgentGuard in LLM Agent MOOC Hackathon hosted by UC Berkeley in 2024
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2502.09809 [cs.CR]
	(or arXiv:2502.09809v1 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2502.09809

Submission history

From: Jizhou Chen [view email]
[v1] Thu, 13 Feb 2025 23:00:33 UTC (34 KB)

Computer Science > Cryptography and Security

Title:AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:AgentGuard: Repurposing Agentic Orchestrator for Safety Evaluation of Tool Orchestration

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators