Mindgard

Protect AI Jailbreak and Prompt Injection Guardrail Evasion

Affected Vendor(s)

Protect.ai

Affected Product(s)

Prompt Injection and Jailbreak Classifiers

Summary

Protect AI offer two Prompt Injection and Jailbreak classifiers on HuggingFace. These classifiers are heavily used by a range of guardrail systems and are designed to enable developers of AI applications to detect and react to incoming prompt injections and jailbreaks. We successfully demonstrated how an attacker can fully evade, or greatly degrade, classification accuracy of the classifier, enabling prompt injections and jailbreaks to pass through filters and subsequently to the protected AI application.

Timeline

Discovered on

Disclosed to Vendor on

March 12, 2025

Published on

March 31, 2025

Credit

William Hackett

Lewis Birch

Blog Post

Bypassing Azure AI Content Safety Guardrails

References