AmtocSoft Tech Insights: When the Attacker Has an LLM: Defending Against AI-Developed Exploits

Sunday, May 31, 2026

When the Attacker Has an LLM: Defending Against AI-Developed Exploits

A split screen: an LLM generating exploit code on one side, a defender's detection dashboard on the other

Introduction

The first time I watched an LLM agent move through a compromised box in a red-team replay, what unsettled me was not the cleverness. It was the patience. The agent enumerated the environment, found a cloud metadata endpoint, pulled temporary credentials, and started listing S3 buckets, and it did all of it in the flat, tireless cadence of something that never gets bored and never fat-fingers a command. No typos, no hesitation, no coffee break I could catch it during.

That cadence is the story of 2026. Attackers now have the same agentic tooling we use to ship features. In May 2026 researchers documented the first known zero-day 2FA bypass developed with AI assistance and used for mass exploitation (The Hacker News, 2026). Around the same time, an unknown threat actor was observed using an LLM agent for post-compromise actions after exploiting a vulnerability in Marimo, extracting cloud credentials and SSH keys (2026: The Year of AI-Assisted Attacks, The Hacker News). The defensive side is arming up too: OpenAI shipped Daybreak with GPT-5.5-Cyber tooling for vulnerability detection and patch validation (The Hacker News, 2026).

This post is a defensive playbook, and only a defensive one. We will not write exploit code. We will model what an attacker's agent actually does, then build the controls that blunt it: shrinking blast radius, scoping credentials so a stolen token is nearly worthless, and detecting the specific behavioral signature of an agent loose in your environment.

The Problem: The Economics of Attack Just Changed

For two decades the limiting reagent in offensive security was skilled human time. Finding a novel bug, chaining it, and then carefully working through a network without tripping alarms took an expert and took hours or days. That scarcity is what most of our defenses quietly assumed. Rate-based alerting, "this looks like a human fumbling," and the hope that an attacker would make a noisy mistake all lean on the cost of human attention.

An LLM agent removes that assumption. It does not lower the ceiling of what the best human attacker can do; it lowers the floor and widens the base. A mediocre operator with a capable agent can now enumerate, pivot, and exfiltrate with a consistency that used to require real expertise. The three shifts that matter for defenders:

Speed of post-compromise. Once an attacker is in, an agent can enumerate identity, storage, and secrets in seconds, not the minutes or hours a human takes. Your detection window shrinks accordingly.
Tirelessness over breadth. Agents try everything. Every endpoint, every default credential, every misconfigured bucket, methodically, without the fatigue that makes humans take shortcuts. The long tail of "nobody would bother checking that" is now checked.
Adaptation in the loop. When a step fails, the agent reads the error and adjusts, the same recovery behavior we build into our own production agents. A blocked path is information, not a dead end.

Threat model diagram showing an attacker LLM agent's loop against a target environment and the defensive controls at each stage

The uncomfortable truth is that none of these are new attack techniques. Credential theft from a metadata endpoint is ancient. What changed is that the cost of executing the full chain, competently, dropped close to zero. Defenses calibrated to human cost need recalibrating to machine cost.

How It Works: Modeling the Attacker's Agent

To defend against an agent you have to think like one. An offensive agent runs the same observe-decide-act loop as any other, against your environment as its world.

flowchart TD A[Initial access] --> B[Enumerate identity + cloud metadata] B --> C[Pull temporary credentials] C --> D{Credentials useful?} D -->|yes| E[List storage, secrets, network] D -->|no| F[Try next identity / endpoint] F --> B E --> G[Stage exfiltration] G --> H[Persist or pivot]

The single highest-value link in that chain for an attacker is the jump from "code execution on a box" to "valid cloud credentials." That is the Marimo case in a nutshell: exploit the app, then have the agent hit the instance metadata service and walk away with temporary credentials and SSH keys. Everything downstream depends on that pivot succeeding.

Which is exactly why it is the best place to defend. If the credentials an agent steals are tightly scoped, short-lived, and bound to the context they were issued in, the agent's enumeration returns a pile of AccessDenied errors instead of a map of your data. The agent is patient, but it cannot brute-force a permission it was never granted.

So the defensive strategy writes itself from the attacker's loop: make the credential pivot yield as little as possible, and make the enumeration that follows loud enough to catch.

Implementation Guide: Defensive Controls That Bite

Let us build the three controls that matter most, in priority order.

Control 1: Scope and bind credentials

The first control is making stolen credentials nearly worthless. On AWS this means IMDSv2 with a hop limit of 1 so a compromised container cannot reach the metadata service at all, plus instance roles scoped to exactly the actions the workload needs. Here is the policy posture, expressed as a check you can run against a role.

import json

DANGEROUS_ACTIONS = {"*", "s3:*", "iam:*", "secretsmanager:*", "sts:AssumeRole"}

def audit_role_policy(policy: dict) -> list[str]:
    """Flag overly-broad grants that turn a stolen token into a skeleton key."""
    findings = []
    for stmt in policy.get("Statement", []):
        if stmt.get("Effect") != "Allow":
            continue
        actions = stmt.get("Action", [])
        actions = [actions] if isinstance(actions, str) else actions
        resources = stmt.get("Resource", [])
        resources = [resources] if isinstance(resources, str) else resources

        for a in actions:
            if a in DANGEROUS_ACTIONS:
                findings.append(f"broad action '{a}' grants too much on compromise")
        if "*" in resources and any("*" not in a for a in actions):
            findings.append("Resource '*' widens blast radius; scope to ARNs")
        if not stmt.get("Condition"):
            findings.append("no Condition block; bind to VPC/source IP/MFA")
    return findings

Run it against a deliberately sloppy role and it tells you exactly where the blast radius is:

$ python audit_iam.py --role app-runtime-role
[audit] role: app-runtime-role
  ! broad action 's3:*' grants too much on compromise
  ! Resource '*' widens blast radius; scope to ARNs
  ! no Condition block; bind to VPC/source IP/MFA
3 findings -> a stolen token from this role reads every bucket in the account

The fix for each finding is mechanical: replace s3:* with the four or five specific actions the app uses, pin Resource to named ARNs, and add a Condition binding the credential to the VPC it was issued in. A credential that only works from inside your VPC is dead weight the moment an agent tries to use it from its own infrastructure.

Control 2: Make enumeration loud

An offensive agent's defining behavior is breadth. It calls many different APIs in a short window because it is mapping the environment. A human operator rarely does this; they have a hypothesis and act on it. That difference is detectable. The signal is not request rate (agents can throttle) but request diversity: many distinct sensitive actions from one identity in a short window.

from collections import defaultdict
import time

SENSITIVE = {
    "ListBuckets", "GetSecretValue", "ListSecrets", "DescribeInstances",
    "ListAccessKeys", "GetParameter", "AssumeRole", "ListRoles",
}

class EnumerationDetector:
    def __init__(self, window_s: float = 60.0, distinct_threshold: int = 5):
        self.window_s = window_s
        self.distinct_threshold = distinct_threshold
        self.seen: dict[str, list[tuple[float, str]]] = defaultdict(list)

    def observe(self, identity: str, action: str, now: float) -> str | None:
        if action not in SENSITIVE:
            return None
        events = self.seen[identity]
        events.append((now, action))
        # drop anything outside the window
        cutoff = now - self.window_s
        self.seen[identity] = [(t, a) for (t, a) in events if t >= cutoff]
        distinct = {a for _, a in self.seen[identity]}
        if len(distinct) >= self.distinct_threshold:
            return (f"enumeration: {identity} touched {len(distinct)} distinct "
                    f"sensitive actions in {self.window_s:.0f}s")
        return None

Replay a captured agent trace through it and the alert fires well before exfiltration:

$ python detect_enum.py --trace marimo-replay.jsonl
t+0.4s  i-09fa ListBuckets
t+0.9s  i-09fa DescribeInstances
t+1.1s  i-09fa ListSecrets
t+1.4s  i-09fa GetSecretValue
t+1.6s  i-09fa ListAccessKeys
ALERT  enumeration: i-09fa touched 5 distinct sensitive actions in 60s
       -> isolate i-09fa, rotate its role credentials

Five distinct sensitive calls in under two seconds is not a human reading a dashboard. It is an agent mapping your account, and you want to isolate the instance and rotate its role before the chain reaches GetSecretValue for the secrets that matter.

Control 3: A decision flow for response

Detection without a fast, pre-decided response just means you watch the breach happen in higher resolution. The response to a suspected agent has to be automatic for the early steps, because you are racing something that does not pause.

flowchart TD A[Enumeration alert] --> B{Identity type} B -->|workload role| C[Auto-revoke session, rotate role creds] B -->|human user| D[Step-up MFA challenge] C --> E[Snapshot + isolate host from network] D -->|fails| C D -->|passes| F[Log, lower severity, keep watching] E --> G[Open incident, preserve forensics]

The key design choice: for a workload identity, revoke first and ask questions later. A application server role has no business enumerating secrets, so an enumeration alert on it is almost certainly compromise, and the cost of a wrongful revoke is a single restarted pod. For a human identity the same behavior might be a security engineer doing legitimate work, so you challenge with step-up MFA instead of cutting them off. Matching the response to the identity type keeps the automation aggressive where it is safe and careful where it is not.

A Gotcha: The Detector That Cried Agent

The first version of my enumeration detector keyed on request rate, a count of sensitive calls per minute regardless of which calls they were. It looked reasonable in testing and then drowned us in false positives the first morning it ran in production.

The culprit was our own backup job. Every night at 02:00 a perfectly legitimate workload listed every bucket, described every instance, and read a batch of parameters to do its work. By raw rate it looked exactly like an attacker enumerating the account, and the detector dutifully revoked its credentials mid-run. The backup failed, paged on-call, and taught me that rate alone cannot tell a busy friend from a curious foe.

$ grep ALERT detector.log | head -3
02:00:03 ALERT rate: backup-role 41 sensitive calls/min -> REVOKED
02:00:03 backup job failed: credentials revoked mid-run
02:00:04 page: oncall woken for a false positive

The fix was twofold. First, switch the signal from rate to distinct-action diversity within a short window, which is what the code above does, because a backup job hammers the same small set of actions repeatedly while an attacker's agent touches many different actions as it explores. Second, maintain an allowlist of known-broad workload identities with their expected action sets, and exempt a call only when it matches the baseline that identity always exhibits. After that change the nightly backup stopped tripping the alert and a synthetic agent replay still fired every time. The lesson generalizes: a security control that fights your own legitimate automation gets disabled by the team it annoys, and a disabled control protects nothing.

Hardening the Pivot Itself

The two controls above blunt the agent once it is moving. The cheapest win, though, is to break the pivot at its source so the agent never gets a usable credential in the first place. The metadata service is the hinge, and there are three concrete settings that turn it from an open door into a locked one.

First, enforce IMDSv2 and set the response hop limit to 1. IMDSv1 answers any process that can make an HTTP request to 169.254.169.254, which is exactly the request a compromised app makes. IMDSv2 requires a session token obtained with a PUT, and the hop limit of 1 means a containerized workload cannot reach the host metadata service through the extra network hop. A surprising number of credential-theft chains die right here, because the agent's first GET to the metadata endpoint simply times out.

def check_imds_hardening(instance_md: dict) -> list[str]:
    findings = []
    if instance_md.get("HttpTokens") != "required":
        findings.append("IMDSv2 not enforced; v1 lets any process pull creds")
    if instance_md.get("HttpPutResponseHopLimit", 2) > 1:
        findings.append("hop limit > 1; containers can reach host metadata")
    if instance_md.get("HttpEndpoint") == "enabled" and not instance_md.get("needs_metadata"):
        findings.append("metadata endpoint enabled on a workload that never uses it")
    return findings

Running it across a fleet finds the instances that are one bug away from leaking credentials:

$ python check_imds.py --fleet prod
i-09fa  ! IMDSv2 not enforced; v1 lets any process pull creds
i-09fa  ! hop limit > 1; containers can reach host metadata
i-2b71  clean
i-7c44  ! metadata endpoint enabled on a workload that never uses it
2 of 3 instances expose the credential pivot -> remediate i-09fa, i-7c44

Second, treat any metadata access from outside your known credential-refresh path as an alert in its own right. Your SDK refreshes credentials on a predictable schedule from a predictable process. An agent hitting the endpoint mid-exploit does not match that pattern, and the mismatch is a high-fidelity signal precisely because legitimate metadata access is so regular.

Third, prefer credentials that are bound to their network context. A session token that carries a Condition requiring it be used from the issuing VPC is useless to an agent operating from its own infrastructure, even if the agent manages to exfiltrate the raw token. You have turned a portable skeleton key into a key that only works in one lock, in one building, that you control.

Comparison and Tradeoffs

How do the available postures against agent-driven attacks compare? Here is how I weigh them.

Control	Stops credential pivot	Catches enumeration	False-positive risk	Effort	Verdict
Perimeter firewall only	No	No	Low	Low	Necessary, not sufficient
Broad IAM + trust the network	No	No	Low	Low	This is how breaches spread
Scoped + bound credentials	Yes	No	Low	Medium	Highest single-control payoff
Rate-based anomaly detection	No	Weak	High	Medium	Noisy, fights your own jobs
Diversity-based detection	No	Yes	Low	Medium	The detection worth building
Scoped creds + diversity detect + auto-response	Yes	Yes	Low	High	The posture you actually want

flowchart LR subgraph Human["Human-cost era"] H1[Slow recon] --> H2[Manual pivot] --> H3[Noisy mistakes catch them] end subgraph Agent["Agent-cost era"] A1[Instant recon] --> A2[Tireless pivot] --> A3[No mistakes to catch] end Human -.defenses assumed this.-> Agent Agent -.recalibrate to machine cost.-> Now[Scope + detect + auto-respond]

Comparison visual: human-cost-era defenses versus agent-cost-era defenses side by side

The central tradeoff is automation aggressiveness versus operational disruption. Auto-revoking a workload credential on an enumeration alert is the right call, and it will occasionally restart a pod that did nothing wrong. That is a cheap price. Auto-revoking a human's session on the same signal is not, because the disruption is high and the legitimate-use rate is real, which is why the response flow forks on identity type. Get that fork wrong in either direction and you either miss real attacks or train your team to route around the control.

Production Considerations

A few things that matter once these controls are live.

Test against replays, not theory. Capture real agent traces in a lab (your own red team driving an agent against a disposable environment) and replay them through your detectors continuously. A detector that has never seen an agent trace is a detector you are debugging during an incident.

Make revocation fast and reversible. The response is only useful if revoking a session takes seconds and reinstating a wrongly-revoked workload is a one-click action. Slow revocation loses the race; irreversible revocation makes the team afraid to let it run.

Watch the metadata service like a hawk. The credential pivot runs through instance metadata. Enforce IMDSv2, set the hop limit to 1, and alert on any access to the metadata endpoint from a process that is not your known credential-refresh path. That single endpoint is the hinge the whole attack chain swings on.

Assume the attacker has read your runbooks. An LLM agent can ingest your public documentation, your blog posts, even this one. Defenses that depend on the attacker not knowing your layout are weak. Defenses that hold even when the attacker knows everything, scoped credentials, hard network boundaries, fast revocation, are the ones that last.

Conclusion

AI did not invent new attacks. It made the existing ones cheap, fast, and tireless, which breaks the human-cost assumption baked into a generation of defenses. The response is not panic and not a new product. It is recalibration: assume the attacker has a capable agent, then make the moves that agent depends on yield as little as possible.

Three controls carry most of the weight. Scope and bind credentials so the post-compromise pivot returns AccessDenied instead of your data. Detect enumeration by action diversity, not raw rate, so you catch the agent mapping your environment without revoking your own backup job. And pre-decide an automatic response that forks on identity type, aggressive for workloads, careful for humans. None of this requires matching the attacker's model. It requires building the environment so that a tireless, patient, perfectly-competent agent still hits a wall.

Working code for the IAM auditor, the diversity-based detector, and a replay harness that drives a simulated agent trace through both lives in the companion repo: github.com/amtocbot-droid/amtocbot-examples/tree/main/261-llm-attacker-defense.

Get the next one

I send a weekly security-engineering note with one production failure pattern, the debug trail, and the control or detector that came out of it. No spam, unsubscribe anytime.

👉 Subscribe (free)

Reader challenge: run one read-only IAM or metadata-service audit against a non-production account and look for the first credential pivot an agent would try. Reply to the email or comment with the control that was missing.

Revision History

Date	Summary	Old Version
2026-06-07	Added the newsletter signup and reader-challenge block so this recent AI-security post feeds the owned audience funnel.	View previous version

Sources

About the Author

Toc Am

Founder of AmtocSoft. Writing practical deep-dives on AI engineering, cloud architecture, and developer tooling. Previously built backend systems at scale. Reviews every post published under this byline.

LinkedIn X / Twitter

Published: 2026-06-02 · Updated: 2026-06-07 · Written with AI assistance, reviewed by Toc Am.

Get These In Your Inbox

Weekly deep-dives on AI engineering, no fluff. Join the newsletter →

Subscribe (free)

Or grab the book ($39, ~100 pages) · Buy me a coffee

☕ Buy Me a Coffee · 🔔 YouTube · 💼 LinkedIn · 🐦 X/Twitter

AmtocSoft Tech Insights