AI Red Teaming Trends: What Security Leaders Need to Prepare for Next

Key Takeaways

  • AI attacks are evolving quickly beyond prompt injections to target agentic systems at scale including tool access, memory, supply chain vulnerabilities, and multimodal inputs.
  • AI red teaming requires persistent hybrid human and automated testing throughout development pipelines vs. point in time testing.
AI cybersecurity shield visual representing AI threat detection, prompt injection defense, and enterprise AI security

In This Article

Threats against AIs are already quantifiable, growing rapidly in frequency, and becoming more advanced. 

Fully automated jailbreak agents have achieved success rates as high as 97.14% in one study. Multimodal attacks have scored up to 97-99% in some tests. Attacks may have low success rates and still pose significant risks when used at scale. 

Indirect prompt injection attacks succeed only 0.5%-8.5% of the time, but can still lead to vast windows of exposure for high-volume applications. Most concerning: Security researchers estimate that upwards of 73% of tested LLM applications are vulnerable to prompt injection. According to Trend Micro, 26.2% of AI-related CVEs are ranked as "high" or "critical" severity.

AI is only going to get more agentic, multimodal, and integrated into enterprise workflows. As a result, red teaming efforts are expanding well past testing of individual models. The largest AI red teaming trends today center around scale, persistence, and real-world impact: everything from automated testing that never sleeps to attacks targeting your tools, memory, permissions, and even the AI supply chain. These AI red teaming trends are already having an impact on how organizations protect AI at scale. 

What Experts Are Saying About the State of AI Security

AI security monitoring interface highlighting AI risk detection, alerts, and enterprise threat protection.

The discussion around AI security and AI red teaming has evolved significantly over the last year. Where we once saw timid academic hypothetical speculation, we now see alarmed calls to action. Security executives, AI researchers, and policy makers are nearly unanimous in their belief that organizations must radically change how they test and harden AI systems. And they're running out of time.

Apostol Vassilev works as a research team supervisor for the National Institute of Standards and Technology, with a specialization in adversarial machine learning. He’s been vocal about why point-in-time security testing just doesn’t work anymore:  “...the security of AI systems is not a static problem—one that can be solved once and done,” he has said

Researchers like Vassilev point out that there is no silver bullet set of guardrails to protect you against every possible adversarial prompt. Instead, he recommends thinking of red teaming as a form of proactive, continuous improvement: not yearly or monthly audits, but frequent testing and tweaking of your defenses so you identify vulnerabilities before bad actors do.

“These attacks exploit AI vulnerabilities by manipulating model behavior. These threats evolve rapidly, taking advantage of AI’s reliance on untrusted inputs and its opaque, black-box like, decision-making,” says Mindgard CEO and Co-founder, Dr. Peter Garraghan, in an interview published by AI TechPark

“Traditional security solutions, like code scanners or firewalls, fail to address these risks because vulnerabilities emerge at runtime rather than in static code,” he further explains. “The only way to truly mitigate these threats is with continuous, automated security testing that’s always learning from what it identifies and defends against. Conventional security tools aren’t designed to work continuously in this matter, let alone learn from their own functions.”

From Prompts to Pipelines: Where AI Attacks Actually Happen Now

AI security monitoring interface highlighting AI risk detection, alerts, and enterprise threat protection.

One prompt injection can set off a cascade of downstream effects with zero human oversight. That’s why we’re seeing agentic AI red teaming shift beyond model outputs and into real-world system behavior. The new tests probe how AIs agents call tools, take actions, and interact with outside systems. (Adversa)

Multimodal jailbreaks are creating a brand new attack surface. New research into vision-language models found that you can now encode malicious intent through images and cross-modal inputs, not just text. That means red teaming AI now includes breaking how models reason across modalities. (arXiv)

Your AI supply chain is part of your attack surface. Things like third-party models, integrations, and automation pipelines are where security will broaden. Injection flaws and sensitive data leaks won't just spring up within your applications. They can be seeded from outside your system. (Hackread)

Indirect prompt injection is becoming one of the fastest growing threat vectors. Rather than targeting the model itself hackers are hiding instructions in web pages, documents, and other external content consumed and acted upon by AI. (Practical DevSecOps)

Prompts and policy layers are targets as well. Increasingly, red teaming considers how underlying prompts can be engineered to evade protections or change a model's behavior. (Practical DevSecOps)

Retrieval augmented generation increases risks around data exfiltration. You may want to start testing explicitly around how sensitive information could leak through external knowledge bases and retrieval processes. (Invisible Technologies)

Videos can also hold secret instructions for LLMs. Hidden instructions embedded in images that make up video frames can subvert a model's behavior while escaping detection by human reviewers. (arXiv)

Document formatting itself is an attack surface. Layouts themselves can be adversarial, as can structured documents. (Practical DevSecOps)

Agentic AI Is Changing the Threat Model Entirely

Visualization of agentic AI systems communicating across a connected AI workflow network


Attacks are moving beyond single prompt attacks to chained workflows. Red teaming will evolve to encompass trying out attacks that behave differently over multiple turns of conversation and utilize stored memory, context, and behavior. (Tredence)

Agents can also create entirely new classes of risks. Current AI red teaming trends include testing for improper tool use and the ability of agents to run actions or invoke APIs that they aren’t authorized to use. (AWS)

Lateral movement becomes possible against agentic AI. Attackers can leverage agent permissions/access coupled with session context to pivot around environments. (AWS)

Agent memory becomes its own security perimeter. Stored data, session history, and long-term memory manipulation are becoming common areas of red teaming focus. (AWS)

Excessive permissions have always been a risk for cybersecurity, but agentic AI raises the stakes. More red teams are looking at unauthorized actions, privilege escalation, and workflow hijacking to test for excessive permissions. (Cycode)

Manual Red Teaming Can’t Keep Up Anymore

AI is transforming how security leaders practice red teaming. Gone are the days of static exercises. Enterprises need continuous, automated adversarial testing against production-like environments. This AI red teaming trend will help ensure you’re testing your models against the latest attack vectors. (Mindgard)

Red team automation is already well underway. From automatically generating adversarial payloads to orchestrating attacks and testing thousands of attack scenarios at scale, AI systems have taken on much of the heavy lifting. However, this doesn’t mean AI will replace humans. On the contrary, it allows human red team members to focus on more sophisticated strategy and attack design. (Hackread)

AI-based attack payloads are increasing speed exponentially. Generative technologies can be used to craft large volumes of varied, high-fidelity attack input. You can pre-position far more realistic coverage everywhere you need it with the proper approach. (Hackread)

Autonomous reconnaissance is critical regardless of your organization's size. New technologies can identify AI attack surfaces, AI-mimic attack simulations, and automated/scriptless adaptive black-box testing. (Invicti)

Regression testing is mandatory. Each change to models, prompts or systems creates new risk. Mature AI security programs retest everything continually. (Invicti)

Speed still matters, of course. Regression testing is no exception. LLM agents can finish advanced attack tasks within minutes: 5,000x faster than humans in some cases. (arXiv)

AI Security Is Moving Into the Development Pipeline

Developer reviewing code on a laptop during an AI security or red teaming session
Photo by Nangialai Stoman from Unsplash

Security validation doesn’t stop at test time anymore. Frameworks are placing increased focus on validating controls and enforcing boundaries, as well as monitoring behavior at runtime, where failures can occur in production. (AWS)

Red teaming will be integrated into the SDLC itself. The use of CI/CD and MLOps pipelines allows models to be retested constantly after adjustments instead of periodic reviews. (Hackread)

Red teaming outputs are increasingly structured as audit-ready reports. These reports are designed for compliance, governance frameworks, and executive decision-making. (National Law Review)

Findings aren’t simply patched together for technical resolutions anymore. Remediations are now ranked by potential business impact. Next generation tools provide risk scoring and actionable recommendations. (Hackread)

Red team reports are feeding directly into protections. Tools today aren’t just finding vulnerabilities. They’re being leveraged to create guardrails within production environments. (Hackread)

Organizations have always valued governance, but we’re starting to see them bake it into their development pipelines. With standardized frameworks like NIST AI RMF and ISO 42001, teams are formalizing testing processes to align with enterprise risk management. (Redteams.ai)

The Best AI Red Teams Aren’t Fully Human or Fully Automated

Hybrid is the future of red teaming. Human creativity is best used in tandem with automation that can scale and adjust testing on the fly. (Mindgard)

If you aren’t benchmarking already, now’s the time to start. There can be huge differences in vulnerabilities and attack success rates between models, so testing across models is important to any red teaming strategy. (arXiv)

Studies have indicated that successful techniques against one model often transfer to others. Defenses that focus on stopping attacks against a single model may be less effective. (arXiv)

Competition-style red teaming in the public realm is speeding up progress. Benchmark-driven competitions allow researchers to quickly identify new attacks and validate testing procedures. (arXiv)

Real world user simulation is now the baseline. Attackers don’t operate cleanly or predictably anymore so testing must take on more realistic user behavior. (Tredence)

Red teams are incorporating AI applications built into productivity software. These software solutions can be abused to directly impact business processes. (Tredence)

AI Red Teaming Is Moving From Niche to Necessity

Team collaborating on AI security strategy with digital network and machine learning overlays

One advantage of managed services is that red teaming becomes available to more organizations. Data suggests there’s already demand for “red teaming as a service,” particularly among organizations too small to staff AI security teams in-house. (Research and Markets)

AI red teaming is increasingly important in highly regulated industries. Finance, healthcare, and government industries are investing heavily in AI red teaming to meet growing compliance mandates. (Research and Markets)

According to analysts, the AI red teaming market is expected to grow from $1.75 billion in 2025 to $4.8 billion by 2029. (National Law Review)

The Numbers Behind AI Risk Are Hard to Ignore

Autonomous jailbreak agents remain successful. One study showed these jailbreak methods succeeding 97.14% of the time, demonstrating the power of sophisticated reasoning systems to circumvent model protections. (Nature)

Jailbreaks with multimodal inputs are reaching comparable rates of success. One assessment found jailbreak success rates between 97-99% for some scenarios. (arXiv)

Attack success rate depends heavily on model. Researchers tested jailbreak prompts with GPT-3.5 and found them to have a success rate of 21.12%. Jailbreaks didn't fare as well with other models. Due to this, model-specific evaluation is still considered a leading AI red teaming trend. (NDSS Symposium)

Low success rates could still be bad news if you're operating at scale. Researchers found indirect prompt injection attacks could have success rates between 0.5%-8.5%. (arXiv)

Red teaming exercises should simulate persistent attacks instead of one-off attempts. Persistence increases the likelihood of success exponentially. Studies have found that attack success rate jumps from 13% for single-turn attacks to 92% for multi-turn conversations. (VentureBeat)

Prompt injection remains the most pervasive vulnerability. Some reports suggest that more than 73% of tested LLM apps are vulnerable to prompt injection. (SQ Magazine)

Still, indirect prompt injection is growing faster than direct attacks. Attackers prefer this option because they can use trusted content sources to bypass a lot of AI defenses. (SQ Magazine)

The distinguishing factor for AI security is model-specific behavior. The identical jailbreak prompt can work on one model but not another. This makes broad defenses unreliable. (Berkeley AI Research)

AIRTBench found that models performed significantly worse on system exploitation and model inversion attacks compared to prompt injection attacks. This suggests that different attack types need different defenses. (arXiv)

Defense against multi-turn attacks needs to account for several turns of conversation within and across sessions. Attackers can string together interactions both within sessions and across memory, rendering defenses that operate at the “per prompt” level ineffective. (VentureBeat)

AI security risks are serious. 26.2% of CVEs scored as AI-related by Trend Micro were rated “high” or “critical” severity. These attacks have an impact in the real world, so take risks seriously if you’re investing in AI. (Trend Micro)

The Stakes Have Never Been Higher: Why AI Red Teaming Can’t Wait

As the threat landscape continues to grow, one thing has become abundantly clear: AI security is today’s operational imperative. Attacks are evolving in sophistication, persistence, and targeting across models, agents, pipelines, and the entire AI supply chain. If you’re still relying on point-in-time audits or manual testing, you’re already behind.

Developed on more than 10 years of AI security research out of Lancaster University, Mindgard is designed from the ground up to help you tackle the risks conventional application security tools were never designed to address. Mindgard’s platform acts as your AI red teaming engine that autonomously maps attack surfaces, performs reconnaissance, and discovers actionable vulnerabilities across AI models, agents, and applications before the bad actors do.

Rather than relying on red teaming as a point-in-time activity, Mindgard’s Offensive Security platform brings red teaming into your dev and SecOps workflows so you can continuously test throughout your AI development lifecycle. And if something does slip into production, Mindgard closes the loop with runtime detection and response that applies contextual guardrails and provides remediation recommendations in real time.

When a single multi-turn conversation can spike attack success from, when more than 73% of LLM deployments are vulnerable to prompt injection attacks, and when leading analysts expect the AI red teaming market to nearly triple by 2029, security leaders need to determine whether their current strategy is keeping pace. Mindgard is built for businesses that don’t want to learn that answer the hard way. Schedule a demo to see Mindgard in action.

Frequently Asked Questions

What skills do AI red teamers need today?

AI red teaming isn’t just about security; today, it’s a hybrid role. The most effective red teamers combine traditional offensive security skills (like penetration testing and threat modeling) with a working understanding of machine learning systems, prompt engineering, and AI workflows. Just as important, though, is systems thinking. Modern red teamers need to understand how models interact with APIs, tools, data pipelines, and users.

How is AI red teaming different from traditional penetration testing?

Penetration testing usually examines infrastructure, networks, and applications for vulnerabilities. AI red teaming looks for vulnerabilities in behavior: how something interprets inputs, decides what to do, and acts.

How should AI red teaming effectiveness be measured?

The goal isn't just "finding vulnerabilities." Like with any type of security testing, success should be measured by how much more resilient you are with each session. Common metrics include:

  • Time-to-detection
  • Reduction in repeat vulnerabilities
  • Coverage across workflows
  • Remediation speed