Nvidia NemoGuard Jailbreak Detect Guardrail Evasion

Affected Vendor(s)

Affected Product(s)

Summary

Nvidia offer an open source Jailbreak classifier on HuggingFace called NemoGuard Jailbreak Detect. These classifiers are heavily used by a range of guardrail systems and are designed to enable developers of AI applications to detect and react to incoming prompt injections and jailbreaks. We successfully demonstrated how an attacker can fully evade, or greatly degrade, classification accuracy of the classifier, enabling prompt injections and jailbreaks to pass through filters and subsequently to the protected AI application.

Timeline

Discovered on
Disclosed to Vendor on
March 11, 2025
Published on
April 3, 2025

Credit

Blog Post

References

Learn how Mindgard can help you navigate AI Security

Take the first step towards securing your AI. Book a demo now and we'll reach out to you.