🔒 Security

Security Best Practices

Production hardening for Agent Secret Store — from namespace design to incident response. Follow these patterns to run a credential store that survives real adversarial conditions.

Least-privilege namespace design

The most important security decision is your namespace structure. Namespaces directly control how granularly you can scope tokens. A flat namespace means all agents get access to everything in the same scope; a service-hierarchical namespace lets you issue tokens that cover exactly one service's credentials.

Shell

# ❌ BAD: One flat namespace — all agents can access all secrets
production/openai-key
production/stripe-key
production/db-password
production/github-token

# ✅ GOOD: Service-scoped namespaces
# Scope tokens to the minimum path prefix needed
payments-service/stripe/secret-key
payments-service/stripe/webhook-secret
ml-platform/openai/api-key
ml-platform/pinecone/api-key
auth-service/google/client-secret
data-pipeline/postgres/connection-string

# ✅ GOOD: Environment isolation prevents cross-env leakage
production/payments/stripe/secret-key   # token: secrets:read:production/payments/*
staging/payments/stripe/secret-key     # token: secrets:read:staging/payments/*
# A staging agent CANNOT read production secrets

Namespace changes are breaking

Restructuring namespaces after go-live requires updating all scoped tokens and agent configurations. Design your namespace hierarchy before your first production deployment.

Token best practices

Scoped tokens are the primary access control mechanism. Issue the narrowest possible scope for the shortest necessary TTL:

token_best_practices.py

from agentsecretstore import AgentVault

async def issue_minimal_tokens():
    async with AgentVault() as vault:
        # ❌ BAD: overly broad scope
        bad_token = await vault.request_token(
            scope="secrets:*:*",           # Full access — never do this
            ttl_seconds=86400,             # 24 hours — too long
        )

        # ✅ GOOD: narrowly scoped, short TTL
        good_token = await vault.request_token(
            scope="secrets:read:production/ml-platform/openai/*",
            ttl_seconds=1800,              # 30 minutes — matches task duration
            description="GPT-4 inference batch job #4827",
            allowed_ips=["10.0.1.50"],    # Only from the known agent host
        )

        # ✅ GOOD: single-use for high-risk one-off operations
        single_use = await vault.request_token(
            scope="secrets:read:production/payments/stripe/secret-key",
            ttl_seconds=300,               # 5 minutes
            max_uses=1,                    # Burns after one read
            description="One-time payment intent creation",
        )

Use case	Recommended TTL	Notes
One-off task (payment, send email)	5–15 minutes	Use max_uses=1 for truly single-use ops
Short batch job (< 1 hour)	30–60 minutes	Match TTL to expected job duration
Long-running agent session	2–8 hours	Request a new token before TTL expires
Service worker (e.g. API server)	Max 24 hours	Rotate token daily via cron; never share master key
CI/CD pipeline	Duration of pipeline + buffer	Use dedicated service account key per pipeline

Setting appropriate access tiers

Assign access tiers based on the blast radius if the credential were compromised. When in doubt, err toward a higher tier — you can always relax it after observing false-positive approval friction.

standard

Read-only keys, public data sources, sandbox credentials, rate-limited free-tier keys, non-production secrets.

sensitive

Write-capable API keys, OAuth tokens, staging database passwords, service-to-service tokens with meaningful access.

critical

Production databases, payment processor secrets, admin API keys, SSH private keys, signing keys, KMS credentials.

IP allowlisting for production agents

IP allowlisting is the most powerful defense-in-depth measure available. A stolen token is useless from the wrong IP address. Always specify allowed_ips for production token requests when your agents run from known infrastructure:

ip_allowlist.py

from agentsecretstore import AgentVault

async def configure_ip_restrictions():
    async with AgentVault() as vault:
        # Restrict a token to a specific agent host
        token = await vault.request_token(
            scope="secrets:read:production/stripe/*",
            ttl_seconds=3600,
            allowed_ips=["10.0.1.50"],        # Single agent
        )

        # Restrict to a subnet (GCP/AWS private subnet)
        subnet_token = await vault.request_token(
            scope="secrets:read:production/*",
            ttl_seconds=3600,
            allowed_ips=["10.128.0.0/20"],    # GCP us-central1 subnet
        )

        # Restrict to multiple known hosts
        multi_host_token = await vault.request_token(
            scope="secrets:read:production/ml-platform/*",
            ttl_seconds=7200,
            allowed_ips=["10.0.1.50", "10.0.1.51", "10.0.1.52"],
        )

Cloud provider IP ranges

GCP, AWS, and Azure publish their IP ranges as JSON files. For agents running on managed compute, allowlist the subnet CIDR of your VPC rather than individual IPs. This handles pod autoscaling without manual token updates.

Single-use tokens for high-risk operations

For one-time operations like payment processing, signing, or sending a notification, use max_uses=1 to create a burn-after-reading token. After the credential is retrieved once, the token is server-side invalidated — even if the token string leaks, it's already dead.

Combine with a short TTL (5 minutes) for maximum security: the token is useless after one use or after 5 minutes, whichever comes first.

Approval workflows for production secrets

For critical-tier secrets, require approval even when the requesting agent has appropriate permissions. This adds a human check-in point for sensitive operations:

approval_workflow.py

from agentsecretstore import AgentVault

async def request_with_approval():
    """
    For critical-tier secrets, the vault holds the request and sends
    a notification to your approver. The agent waits (or polls) for
    the decision before the token is returned.
    """
    async with AgentVault() as vault:
        # This call blocks until approved (or times out)
        token = await vault.request_token(
            scope="secrets:read:production/payments/stripe/secret-key",
            ttl_seconds=600,
            description="Payment processing for invoice #INV-2025-0891",
            require_approval=True,                  # Force approval even if not critical
            approval_timeout_seconds=300,           # 5-minute approval window
        )

        # Token only returned after approval
        print(f"Approved! Token: {token.value[:20]}...")

Rotation schedule recommendations

Credential type	Recommended rotation	Priority
Payment processor keys (Stripe, PayPal)	Every 14 days	🔴 Critical
Production database credentials	Every 30 days	🔴 Critical
SSH private keys	Every 30 days	🔴 Critical
Production API keys (OpenAI, Anthropic)	Every 90 days	🟡 High
OAuth tokens	On provider expiry + proactive 60-day	🟡 High
Staging/dev API keys	Every 180 days	🟢 Medium
Webhook secrets	Every 180 days	🟢 Medium
Read-only data source keys	Annually	⚪ Low

Monitoring the audit trail for anomalies

Set up a recurring check (e.g. every 15 minutes) that queries the audit API for suspicious patterns. Key signals to watch:

anomaly_detection.py

import asyncio
from datetime import datetime, timezone, timedelta
from agentsecretstore import AgentVault

async def check_anomalies():
    async with AgentVault() as vault:
        one_hour_ago = datetime.now(timezone.utc) - timedelta(hours=1)

        # 1. Spike in reads — possible exfiltration attempt
        recent_reads = await vault.audit.query(
            event_types=["secret.read"],
            since=one_hour_ago,
        )
        if len(recent_reads) > 500:
            alert(f"Unusual read volume: {len(recent_reads)} reads in 1 hour")

        # 2. Denied access attempts — possible probing
        denied = await vault.audit.query(
            event_types=["secret.read"],
            status="denied",
            since=one_hour_ago,
        )
        if len(denied) > 10:
            alert(f"Multiple denied reads: {len(denied)} in 1 hour")

        # 3. Off-hours access to critical secrets
        after_hours = [
            e for e in recent_reads
            if e.resource_tier == "critical"
            and not (9 <= e.timestamp.hour <= 18)
        ]
        if after_hours:
            alert(f"After-hours critical access: {len(after_hours)} events")

        # 4. Access from unexpected IPs
        known_ips = {"10.0.1.50", "10.0.1.51", "10.128.0.5"}
        unexpected = [e for e in recent_reads if e.ip not in known_ips]
        if unexpected:
            alert(f"Access from unexpected IPs: {[e.ip for e in unexpected]}")

def alert(message: str):
    print(f"🚨 ALERT: {message}")
    # Send to PagerDuty, Slack, etc.

If a key is compromised

Execute this runbook immediately when you suspect a credential has been leaked or compromised. Speed matters — do steps 1 and 2 before anything else:

1Revoke all active tokens covering the secretIMMEDIATE

2Rotate the secret at the provider + update vaultIMMEDIATE

3Pull full audit history for the compromised pathWITHIN 1 HOUR

4Identify all actor IDs and IP addresses that accessed itWITHIN 1 HOUR

5Scope the timeline of unauthorized accessWITHIN 4 HOURS

6Export audit CSV for the incident reportFOR RECORD

7Review and tighten scope/tier for the affected secretAFTER RECOVERY

incident_response.py

from agentsecretstore import AgentVault

async def respond_to_compromise(compromised_path: str):
    """
    Incident response runbook — execute immediately when a key is suspected compromised.
    """
    async with AgentVault() as vault:

        # STEP 1: Revoke all active tokens that cover the compromised path
        tokens = await vault.tokens.list(resource_path=compromised_path)
        for token in tokens:
            await vault.tokens.revoke(token.id)
            print(f"Revoked token: {token.id} ({token.description})")

        # STEP 2: Rotate the secret immediately
        # You'll need the new value from your provider first
        # await vault.update_secret(path=compromised_path, value="new-key-here")

        # STEP 3: Pull the full audit log for the compromised key
        history = await vault.audit.query(
            resource_path=compromised_path,
            # No time limit — get everything
        )
        print(f"Total accesses: {len(history)}")

        # Find all actor IDs that read this secret
        actors = {e.actor_id for e in history if e.event == "secret.read"}
        print(f"Actors that read this secret: {actors}")

        # Get IP addresses involved
        ips = {e.ip for e in history if e.event == "secret.read"}
        print(f"Source IPs: {ips}")

        # STEP 4: Export the audit log for the incident report
        csv_data = await vault.audit.export(
            resource_path=compromised_path,
            format="csv",
        )
        with open(f"incident-{compromised_path.replace('/', '-')}.csv", "wb") as f:
            f.write(csv_data)

Team member access control

Human access to the vault dashboard is managed through roles. Apply the same least-privilege principle to humans as to agents:

Role	Can do	Who should have it
Admin	All actions including member management and billing	Vault owner only (1–2 people)
Editor	Create, update, rotate secrets; manage approval policies	Platform/DevOps leads
Viewer	View secret metadata (never plaintext values)	Developers, on-call engineers
Auditor	Read-only audit log access; CSV export	Security team, compliance officers

Never share admin credentials

Each team member should have their own account. Shared admin credentials make it impossible to attribute changes in the audit log — a core SOC 2 requirement.