Microsoft Azure AI Content Safety Guardrail Evasion

Affected Vendor(s)

Affected Product(s)

Summary

Azure OpenAI studio enables developers to deploy OpenAI models within their Azure organisation. Developers can use a service called ‘Azure AI Content Safety’ to provide text moderation upon the inputs and outputs of a deployed model that aims to detect sensitive content, such as, hate speech, violence,  before reaching downstream applications. We have successfully demonstrated how an attacker can fully evade, or greatly degrade, classification accuracy of the text moderation service upon a dataset of hate speech inputs.

Timeline

Discovered on
Disclosed to Vendor on
March 4, 2024
Published on
June 18, 2024

Credit

Blog Post

References

Learn how Mindgard can help you navigate AI Security

Take the first step towards securing your AI. Book a demo now and we'll reach out to you.