AI guardrails often claim high detection accuracy, but benchmark results can hide real-world gaps. Learn how buyers should evaluate guardrails against realistic attacker behavior.
One of the core pillars within the AI security industry is guardrails, also marketed as AI gateways, designed to monitor, detect, and block malicious instructions sent to and from LLMs and agents. There are now hundreds of offerings on the market, provided by a mix of well-funded startups and global enterprise technology companies, and demand is growing rapidly as organisations race to deploy AI systems at scale.
However, the majority of guardrails and gateways today are still rather nascent in terms of maturity. At Mindgard, we've already published research papers and presented webinars focused on their technical gaps. But rather than revisiting those shortcomings here, I want to address something equally important: how buyers should appraise and differentiate between solutions when purchasing from vendors. Buying the wrong guardrail isn't just a waste of budget, it's a false sense of security that can leave your organisation meaningfully exposed.
Vendors routinely claim their guardrail achieves 95–98%+ accuracy in detecting attacks. On the surface, these figures sound compelling. In practice, it's critical to understand how those statistics are derived and under what conditions.
When a prospective customer engages a vendor for a trial or evaluation, the process typically involves one or more of the following:
Real attackers do not stop when an obvious jailbreak is blocked. They adapt. They paraphrase, translate, fragment instructions across multiple turns, exploit context, manipulate role assumptions, and look for weaknesses in tools, retrieval systems, permissions, and downstream workflows. This is especially important in agentic systems, where the risk is not limited to what the model says, but what the system is allowed to do.

The cumulative effect of these evaluation approaches is that they can drift significantly into "marking your own homework" territory. Every form of evaluation involves trade-offs, and I don't expect perfection. But two concerns stand out.
First, buyers are increasingly relying on vendors to define what "good" looks like, including how to test for it at precisely the moment when customers are still developing their own understanding of AI security. This creates an asymmetry of knowledge that benefits vendors and disadvantages buyers.
Second, we've had multiple customers come to Mindgard after reporting a significant gap between a vendor's claimed detection rates and real-world performance once their gateway was exposed to motivated, realistic attackers. That gap isn't a minor variance, it can render the control substantially less effective than advertised.
Third, and more concretely, in our own customer engagements, we have yet to see a guardrail or gateway that cannot be bypassed when evaluated against adaptive, attacker-aligned techniques rather than static benchmark prompts.
The incentive structure driving this dynamic is understandable. Vendors operating in an increasingly crowded market need to stand out. Reporting stellar accuracy and performance figures especially as some vendors have already been acquired for substantive sums is a rational competitive strategy, even if the methodology behind those figures doesn't hold up to independent scrutiny.

But the pattern is a familiar and troubling one. It closely mirrors the Volkswagen emissions scandal, where performance figures were optimised for the test environment rather than real-world conditions. The consequences of that gap, when discovered, were significant. In AI security, the stakes are comparable: organisations will rely on these technologies to protect their systems, their users, and their data. A guardrail that performs brilliantly on a benchmark but fails against a determined attacker provides protection that is largely illusory.
This post is not an argument against using guardrails or gateways as part of your AI security architecture, they remain a valuable layer of defence when selected and deployed thoughtfully. Rather, it's a call for buyers to approach vendor evaluations with greater rigour and scepticism.
Specifically, consider:
The AI security market is maturing, and buyers are gaining sophistication. Closing the credibility gap between vendor claims and real-world performance starts with asking harder questions and expecting credible answers.
Join Mindgard founder Peter Garraghan on Thursday, June 11, from 11:00 to 11:30 AM ET for a live webinar on how buyers should evaluate AI guardrails and gateways. Peter will expand on the topics covered in this post, including benchmark limitations, vendor accuracy claims, real-world bypass techniques, and how security teams can test whether these controls hold up against adaptive attackers. The session will include a live Q&A.