AI Red Teaming for Image Models

Updated on

January 22, 2025

With this update, Mindgard’s platform and CLI have been updated to support image models.

Stefan Trawicki

TABLE OF CONTENTS

Key Takeaways

With the advent of AI and its rapid integration into almost every facet of our lives, mechanisms to ensure its security and trustworthiness are now a necessity. As outlined within NIST’s AI Risk Management Framework and practiced by leading AI-mature enterprises, red teaming AI systems is an effective approach towards achieving AI security. Such red teaming enables the identification and mitigation of security threats to AI applications, enabling organizations to accelerate their AI adoption.

We are excited to announce that starting today, Mindgard supports automated red teaming of image AI models hosted anywhere! This new capability enables AI and security teams to conduct rigorous testing of image models to identify weaknesses, biases, and potential misuse scenarios, ultimately aiming to enhance their security, reliability, and ethical performance across a wide range of applications.

‍

Image AI at Risk

At Mindgard, we have been developing and implementing adversarial Machine Learning (ML) attacks that can exploit vulnerabilities within AI models to deceive, manipulate or extract sensitive information. In the case of image classification models, evasion attacks introduce subtle, often imperceptible alterations to input images that can lead the model to make incorrect classifications or predictions.

Such threats can undermine the reliability and security of AI applications, ranging from facial recognition to autonomous driving systems resulting in privacy breaches, misidentification, and even physical harm (more about these risks in our deepfake detection bypass blog). Image models have been around for a while and most common real-world applications of image models include autonomous vehicles, medical imaging, retail and e-commerce, social media and content moderation, agricultural and environmental monitoring, security and surveillance, and manufacturing quality control.

With the rise of the Multi-modal and Mixture-of-experts model, we can expect to see even more innovative use cases emerge across various industries and domains including safety critical environments. Therefore, enhancing AI model security against such adversarial attacks is crucial to improve their safety and reliability.

Red Teaming Image AI

With this update, Mindgard’s platform and CLI have been updated to support image models. This means that any image classifier mode can now be tested - all you need is access to the model inference (or API) and a diverse dataset of challenging images. Mindgard includes specific testing scenarios (or attacks, as we like to call them) in our threat library to probe for evasion biases and vulnerabilities, and you can use our simple specification to start. Once the red team tests have completed, security risk scores of your AI model are presented within the terminal, and a URL is generated that navigates to detailed results within your Internet browser.

A step by step tutorial demonstrating the new functionality is available in our CLI repository here, and the GIF below shows a red teaming test against an image model in action:

‍

In the example above, we define our target model configuration using a .toml file. We’ve used the inference endpoint URL from our hosted HuggingFace model ViT-based MNIST ZIC, the testing dataset mnist, and configured the image labels to expect from our model. In the detailed model tests results page of the AI risk report shown above, you can see that Mindgard identified a critical and a high severity threat against ViT-based MNIST ZIC when exposed to SquareAttack (risk score of 80%), and BoundaryAttack (risk score of 100%), indicating high susceptibility to threats of evasion attacks.

Our free trial i.e. Mindgard AI Security Labs currently provides access to two black-box attacks (i.e. no knowledge of internal functions) for image models: SquareAttack and BoundaryAttack. SquareAttack is an adversarial technique that perturbs images using square-shaped patterns, aiming to produce misclassification while maintaining a minimal alteration to the image. Similarly, BoundaryAttack is a decision-based attack that starts with a noise filter perturbation that is iteratively reduced while still causing the same misclassification in the target model. These attacks are designed to test the robustness of image classifiers against adversarial inputs, ensuring that image models can handle potential manipulation without knowledge of the underlying model or access to the AI model’s code, weights or architecture. SquareAttack sends 300+ perturbations and BoundaryAttack sends 3000+ perturbations each to red team image classification models.

To get you started, we’ve included a variety of image datasets covering various domains to test your AI models against, including:
‍

‍

These datasets cover diverse domains spanning facial recognition, medical imaging, satellite imagery, and handwritten digit recognition, enabling testing for a variety of Image AI models.

A wider selection of image models (beyond classifiers), data, attacks and mitigation techniques for Image AI is available in Mindgard Enterprise, and we plan to release new attacks in our CLI in the future. If you would like help testing your AI models or have suggestions on what you would like to see, join us on AISecOps Discord.

Is Your AI Secure? Let's Find Out!

We want to underscore the importance and responsibility for you to understand weaknesses, biases, and potential misuse scenarios to safeguard your image AI systems against adversarial attacks.

We’d love to learn more about your Image AI applications, models you’d like to test, and your security concerns, please reach out to us here and we can help you manage these risks. If you would like to red team your image models today, get started with Mindgard CLI.

About Mindgard

Mindgard is a cybersecurity company specializing in security for AI.
Founded in 2022 at world-renowned Lancaster University and is now based in London, Mindgard empowers enterprise security teams to deploy AI and GenAI securely. Mindgard’s core product – born from ten years of rigorous R&D in AI security – offers an automated platform for continuous security testing and red teaming of AI.

In 2023, Mindgard secured $4 million in funding, backed by leading investors such as IQ Capital and Lakestar.

Next Steps

Thank you for taking the time to explore our latest blog article.

Test Our Free Platform: Experience how our Automated Red Teaming platform swiftly identifies and remediates AI security vulnerabilities. Start for free today!
Follow Mindgard: Stay updated by following us on LinkedIn and X, or join our AI Security community on Discord.
Get in Touch: Have questions or want to explore collaboration opportunities? Reach out to us, and let's secure your AI together.
Please, feel free to request a demo to learn about the full benefits of Mindgard Enterprise.

‍

Webinar: Test AI Systems, Not Models

In this webinar, Dr. Peter Garraghan takes the audience on a deep dive into the underbelly of AI vulnerabilities, exposing the gaps within traditional AI security approaches and demonstrating why application-level AI security must be a priority.

Introducing Mindgard MITRE ATLAS™ Adviser

MITRE ATLAS™ Adviser in Mindgard that helps standardise AI red teaming reporting.

Research: Shadow AI is a Blind Spot in Enterprise Security, Including Among Security Teams

At RSA Conference 2025 and InfoSecurity Europe 2025 we surveyed over 500 cybersecurity professionals to assess emerging threats in enterprise environments. The findings reveal a growing and often overlooked risk: security professionals using generative AI tools without approval, a trend known as Shadow AI.