AI Security
How to Audit a Managed AI Agent That Can Browse, Run Shell Commands, or Edit Files
TL;DR: Treat a managed AI agent that can browse the web, run shell commands, or edit files as a privileged service. Define its attack surface, verify vendor‑provided sandbox guarantees, enforce least‑privilege scopes, log every external interaction, and run regular red‑team style tests. Use a concise audit checklist and continuous monitoring to keep the agent safe for your business.
What does the audit need to cover?
When an AI agent can perform actions beyond plain text generation, the risk profile expands dramatically. Your audit should answer three questions:
- What capabilities does the agent expose (web browsing, shell execution, file manipulation)?
- How are those capabilities sandboxed or limited by the vendor?
- What controls does your organization place around the agent (API scopes, approval workflows, logging)?
Start by reviewing the vendor’s official documentation – for example, the Claude Managed Agents overview explains the default sandbox model and how to configure permission sets.
Identify the critical attack surfaces
Each capability introduces a distinct attack surface:
- Web browsing: the agent can fetch external URLs, potentially exfiltrating data or pulling malicious scripts.
- Shell commands: direct OS interaction can lead to privilege escalation, data leakage, or ransomware.
- File edits: modifying files on shared storage may corrupt codebases, configuration, or customer data.
Map these surfaces to the data you store or process. If the agent never needs to edit production config files, that capability should be disabled entirely.
Build an audit checklist
Use the following checklist as a concrete artifact you can attach to a ticket or compliance spreadsheet. Check each item before the agent goes live and repeat quarterly.
{
"capabilities": ["web_browse", "shell_exec", "file_edit"],
"sandbox": {
"type": "container",
"resource_limits": {"cpu": "0.5", "memory": "256Mi"},
"network": {"allowed_domains": ["api.mycompany.com"]}
},
"api_scopes": ["read:customer_data"],
"human_approval": true,
"logging": {
"events": ["http_request", "shell_command", "file_write"],
"retention_days": 30
}
}
Key checklist items:
- Confirm the vendor’s sandbox model (container, VM, or serverless) and its isolation guarantees.
- Verify that network egress is restricted to a whitelist of domains you control.
- Ensure shell commands run with a non‑root user and have strict resource limits.
- Limit file‑edit permissions to a dedicated, version‑controlled directory (e.g., a Git repo branch).
- Require a human‑in‑the‑loop approval step for any command that writes to production storage.
- Enable immutable audit logs for every request, command, and file operation.
Leverage vendor security controls
Most managed‑agent platforms expose configuration APIs to tighten permissions. For Claude Managed Agents, you can set allowed_actions and network_policy in the agent definition. OpenAI Agents provide a tool_use policy that can disable shell execution entirely.
Reference the OWASP GenAI Security Project (genai.owasp.org) for a high‑level threat model and recommended controls such as “sandbox isolation” and “output validation”. Align your configuration with those recommendations.
Implement runtime monitoring and logging
Even with a hardened sandbox, you need visibility into what the agent actually does. Set up a centralized log collector (e.g., Loki, CloudWatch) and ingest the following fields:
{
"timestamp": "2024-10-12T08:15:30Z",
"agent_id": "sales‑assistant‑01",
"action": "shell_exec",
"command": "curl -s https://api.mycompany.com/v1/customers",
"status": "success",
"output_hash": "sha256:abcd..."
}
Use alerts for any command that accesses the filesystem outside the approved directory or attempts network calls to non‑whitelisted domains.
Test for privilege escalation and sandbox escape
Run periodic red‑team exercises that simulate an attacker controlling the agent’s prompt. Sample test cases:
- Ask the agent to download a binary and execute it with
chmod +x– verify the sandbox blocks it. - Request the agent to read
/etc/passwd– ensure the response is redacted. - Provide a malicious URL that hosts a JavaScript payload and see if the browsing module sanitizes it.
Document findings, remediate misconfigurations, and update the checklist accordingly.
Review incident response procedures
If the agent misbehaves, you need a clear rollback plan:
- Immediately disable the agent via the vendor’s management console.
- Collect the last 24 hours of logs and isolate any files created or modified.
- Run a forensic scan on the host environment (container image diff, file integrity checks).
- Restore affected resources from backups and document the root cause.
Embedding these steps into a small‑team playbook ensures you can respond quickly without needing a dedicated security ops team.
When to involve AISecAll
If you need a custom audit script, a sandbox hardening review, or ongoing monitoring as a managed service, our team can help you implement the checklist above and integrate it with your existing DevOps pipelines.
Need a practical AI security review?
AISecAll reviews prompts, tool permissions, document flows, and agent behavior so small teams can use AI without guessing where the risk sits.