Where Augmentation Stops (Part 2 of 5)

In Part 1 I showed what AI in the SOC verifiably delivers: about 22 percent speed gain on tightly defined tasks. What marketing decks tend to leave out are the limits every honest implementation runs into. Most of them are technical, one comes from human-factors research. None is a tuning question. None disappears with the next model release. If you see it differently, explain how the architecture that produces augmentation suddenly stops producing adversarial attackability. I’m waiting for the argument.

TL;DR: AI in the SOC has four structural limits. LLM-based triage agents are themselves attackable (prompt injection through logs, IOC feeds, mail bodies), ML detectors break systematically on novel TTPs, generic LLMs have no client-specific memory, and automation bias comes from humans, not from the tool. None of them is a tuning question. None disappears with the next model release.

Your LLM triage agent is itself attackable

The problem is old, the vector is new. When an LLM-based triage agent ingests data the attacker can influence (and by definition it does: log files, IOC feeds, mail bodies, threat feeds, everything that comes in through the sensors), the attacker has a prompt-injection path. Greshake et al. formalized the concept in 2023 as “Indirect Prompt Injection” (arXiv:2302.12173). USENIX Security 2024 delivered the systematic benchmark with five attacks against ten defenses against ten LLMs across seven tasks. Even combined state-of-the-art defenses leave substantial residual risks open (Liu et al. 2024).

This is not a lab problem. In November 2025 the EchoLeak vulnerability was disclosed under CVE-2025-32711. A “poisoned email” exfiltrates data from an LLM mail assistant without user interaction. A Carlini et al. study on Google’s Magika classifier shows that a 13-byte change in a file bypasses the routing in 90 percent of cases, and even after defense hardening, 50 bytes suffice for 20 percent success (arXiv:2510.01676). NIST in AI 100-2 has systematically catalogued the classes of evasion, poisoning, and privacy attacks against ML and LLMs. The verdict: no silver-bullet mitigation (NIST 2023/2025).

What frustrates me most is how rarely threat models pick this up. The hope of solving the problem through prompt hardening, output filtering, or defense in depth is understandable. It works in tightly bounded settings. But the structural property remains. LLMs process text as instruction, the attacker controls a portion of the text. Whoever puts an LLM-based triage agent in production runs a service that is structurally attackable. That belongs explicitly in the threat model. Not in the caveat appendix of a slide deck. And no, the next model generation doesn’t fix this. The architecture that enables prompt injection is the same one that enables augmentation in the first place. If you see it differently, show me the mitigation NIST missed.

Novel TTPs break ML detectors systematically

ML-based classifiers break down when the attacker’s telemetry doesn’t match the training data’s telemetry. Not an anecdote. Measured. Catania and Garcia have systematically investigated inter-dataset generalization for ML-based NIDS. For network-centric classes like brute force, DoS, and DDoS, losses stay under five percent. For botnet detection, recall drops by 35 percent. For web attack and infiltration, precision partly collapses entirely (PMC9960990).

The research backs this up from several directions. Corsini and Yang show that established out-of-distribution techniques from computer vision and NLP are not directly transferable to NIDS (arXiv:2308.14376). Wilkie et al. document that zero-day NIDS, even with state-of-the-art contrastive learning, measurably trails the values for known classes (arXiv:2601.09902).

Whoever claims an ML detector reliably recognizes what it has never seen claims something the research methodologically excludes. Not polemical. The state of the research. Augmentation for known classes works. Detection for genuinely novel TTPs remains an open question, and the honest answer for twenty years has been the same: hypothesis-driven hunting by a senior analyst who translates threat intel onto her own telemetry. AI can speed up subtasks within that. It is not the answer.

LLMs have no client memory

The structural property with the least research volume but the largest practical impact: LLMs have no long-term, client-specific memory. The asset inventory of the client environment, the detection history, the business logic of individual applications, the organizational quirks (“we whitelist this tool because the engineering team needs it once a week”). None of that is known to the LLM unless it’s passed in the prompt context.

This is exactly where sales material steps in: “Client-Aware Detection”. In practice that’s a RAG pipeline that injects client data into the prompt window at runtime, with compute costs the vendor doesn’t price up front, and an additional prompt-injection vector on top. Not wrong, but engineering. Not the “out-of-the-box client-aware agent” the slide promises. What annoys me is the gap between label and reality, because it distorts the investment case: what’s sold as plug-and-play is in fact an engineering project with its own budget. Without that clarification, the customer buys a promise he later has to build himself.

Sworna et al. captured the reality in an empirical single-SOC study. LLMs are mainly used for microtasks, i.e. log summarization, script explanation, note drafting. Investigative tasks stay analyst-autonomous, because they need exactly this client context (arXiv:2508.18947). The ENISA assessment in the AI Threat Landscape reports adds: client context and threat-intel correlation aren’t replaceable by generic GPT knowledge.

“End-to-end AI triage without context engineering” exists as a marketing promise. In practice I don’t see it working. What works is AI in tightly bounded microtasks plus an explicit client-context layer, built, documented, and compute-budgeted by an engineer who knows the limits.

Automation bias is human, not technical

The last limit doesn’t come from cyber research, but from aviation and ICU research. Stanton and Plant documented in a 2025 MDPI Aerospace paper that automation bias and complacency, i.e. the “looking-but-not-seeing” effect where trained operators take over a faulty system recommendation without checking it, are prevalent in cockpit research and not reliably trainable away (MDPI Aerospace 5/2/42). A critical-care study with 80 stakeholders comes to the same conclusion for ICU settings (PMC11301121).

What worries me most about this research isn’t the existence of the effect. It’s the fact that aviation has been working on it for five decades with cockpit-resource-management curricula and still hasn’t eliminated it. Cybersecurity won’t crack in five years what aviation hasn’t cracked in fifty. The transfer to SOC operations is uncomfortable. When a triage agent issues recommendations, the analyst under time pressure adopts them more often than checking them. Aviation achieves reduction. Not elimination.

Whoever deploys AI augmentation in a SOC therefore buys not just a detection upside. He also buys an attention risk for the team. Whoever doesn’t document that risk in the assessment buys it anyway. He just doesn’t know what he has.

What this means in practice

These limits don’t disappear with the next model release. The architecture that produces adversarial attackability is the same one that produces productivity augmentation. Whoever wants to deploy AI augmentation in a SOC builds around these limits. Whoever ignores them builds on hope. Hope doesn’t scale in defense. Doesn’t scale in offense either, by the way. It’s just that on the offensive side the attacker notices later.

In Part 3 we look at the organizational consequences. What these limits mean for hiring, training, and architecture in a defensive team. And why “fewer staff” as the answer to AI augmentation regularly has to be taken back.

Part 2 of 5 in this series on AI in defensive cyber, augmentation, not replacement:

Part 1, What the data holds up
Part 2, Where augmentation stops (current)
Part 3, What it means for SOC teams
Part 4, AI vs AI
Part 5, How it could actually work