Mindgard

Microsoft Azure AI Content Safety Guardrail Evasion

Affected Vendor(s)

Microsoft

Affected Product(s)

Azure AI Content Safety

Summary

Azure OpenAI studio enables developers to deploy OpenAI models within their Azure organisation. Developers can use a service called ‘Azure AI Content Safety’ to provide text moderation upon the inputs and outputs of a deployed model that aims to detect sensitive content, such as, hate speech, violence, before reaching downstream applications. We have successfully demonstrated how an attacker can fully evade, or greatly degrade, classification accuracy of the text moderation service upon a dataset of hate speech inputs.

Timeline

Discovered on

Disclosed to Vendor on

March 4, 2024

Published on

June 18, 2024

Credit

William Hackett

Lewis Birch

Blog Post

Bypassing Azure AI Content Safety Guardrails

References