PINCH framework: mapping how model stealing attacks really work

Updated on

December 10, 2025

PINCH is an automated framework that runs large-scale extraction attacks across deep learning architectures to reveal how and when model stealing actually succeeds.

Lewis Birch

TABLE OF CONTENTS

Key Takeaways

PINCH automates end to end extraction attacks across diverse models, datasets, and hardware, not just toy examples.
Certain model architectures and dataset complexities are consistently harder to steal, exposing key resilience characteristics.
Even partially stolen models can be used to stage follow on attacks like model inversion.
Stolen models can match target accuracy while hiding very different internal representations, which undermines naive similarity metrics.

When people talk about “model stealing,” the discussion often stops at theory. In the real world, defenders want to know something more concrete. Which models are easy to steal, which ones are harder, and what actually happens after an attacker walks away with a stolen copy?

The PINCH research project set out to answer exactly those questions. PINCH is an adversarial extraction framework that automates end to end model stealing campaigns across many different deep learning architectures, datasets, and deployment environments. Instead of focusing on a single handcrafted attack, the team built infrastructure to run hundreds of extraction scenarios and measure how they really behave.

From Mindgard’s perspective, this type of work is essential. If security teams are going to build realistic threat models for AI, they need data, not just intuition. PINCH gives us a repeatable way to see how extraction attacks scale, where they fail, and why “partial” theft can still be good enough for an attacker.

‍

What PINCH actually does

At its core, PINCH is a large automation engine for model stealing. The framework:

Loads and trains many different deep learning architectures using dynamic, framework-independent pipelines.
Configures extraction attacks as reusable “scenarios,” each describing how an attacker queries a target model and reconstructs a stolen version.
Orchestrates experiments across multiple hardware and software stacks, then aggregates results into a single view.

Under the hood, PINCH takes advantage of transfer learning and curated deployment repositories to spin up models quickly across domains such as image classification and time series. Once targets are live, it launches extraction attacks that try to recover architecture choices, parameter values, and hyperparameters.

Rather than stop at a few cherry picked examples, we pushed PINCH to evaluate extraction against different model architectures, spanning modern convolutional networks and other deep learning families.

‍

Why this matters for model defenders

Most prior extraction work has been narrow. A paper might showcase a powerful attack against a specific convolutional network on a specific dataset running on a single GPU stack. That proves feasibility, but it does not tell a CISO whether their own architecture choices or deployment environment make theft more or less likely.

By contrast, PINCH highlights which characteristics of a deep learning system actually move the needle:

Model family and depth – Some architectures leak their behavior more readily through query responses, while others are more resistant at the same query budget.
Dataset complexity – The richer and more complex the training distribution, the harder it can be for an attacker to reconstruct fine-grained decision boundaries from limited queries.
Hardware and software platform – Since extraction often relies on end to end system behavior, differences in accelerator behavior and framework implementations can influence how much signal leaks.

This kind of cross cutting view is exactly what operators need when they are deciding which models should sit behind stricter access controls or stricter monitoring.

‍

Partial extraction is still dangerous

A key finding from PINCH is that “fully” stealing a model is not a requirement for real risk. Even when an extraction attack only partially recovers the victim’s behavior, the resulting stolen model can still be used to stage other attacks.

We used PINCH to show that partially successful extraction can support model inversion attacks. In other words, an attacker can use a stolen approximation of your model to infer sensitive training data, even if they have not perfectly cloned your architecture or parameters.

For security teams, this changes the threshold for concern. It is not enough to say “our model is complicated, so copying it exactly is hard.” If an adversary can get close enough to reconstruct sensitive inputs or stage more focused exploits, then the damage is already done.

‍

Stolen models that look the same on the outside

Another important insight is that stolen models can match the target’s accuracy while still looking very different internally. PINCH found cases where two stolen models achieved equivalent performance to the victim, yet their internal learned representations and architectural details diverged.

This has two implications:

Measuring similarity is tricky
If you rely only on simple metrics like accuracy or loss to decide whether an extraction attack “succeeded,” you may draw the wrong conclusion. Two models can agree on predictions but encode knowledge in very different ways.
Legal and compliance questions become harder
From an intellectual property viewpoint, attackers may be able to claim that their stolen model is “different,” even though it is clearly derived from your system. Courts and regulators will need better tools for reasoning about functional equivalence in machine learning.

For defenders, the takeaway is simple. You should assume that an attacker who can heavily query your model can create a functionally equivalent clone that is good enough for practical misuse, even if the internals do not match line by line.

‍

What this means for AI security programs

PINCH provides a realistic lower bound on what an adversary with time, infrastructure, and access can achieve. It shows that:

Automated extraction across many architectures is feasible.
Difficulty varies by model and dataset, but there are always some combinations that are surprisingly easy to steal.
Partial extraction still enables downstream attacks such as inversion and further adversarial staging.

In Mindgard’s view, this reinforces several best practices for AI security:

Treat production models as valuable targets, not just services.
Apply strict access and rate limiting to high value endpoints.
Monitor for unusual querying behavior that looks more like mapping a decision boundary than normal usage.
Plan for the possibility that a determined actor can build a private copy of your model and rehearse attacks offline.

PINCH also underscores the value of automated testing. Just as organizations use dynamic application security testing to probe web apps, AI owners will need frameworks that exercise models under realistic attack patterns and surface systemic weaknesses.

The research behind PINCH gives Mindgard a detailed view of how extraction behaves across the stack. It is one more reminder that AI security is not only about prompts and outputs. It is about the full system of architectures, datasets, hardware, and access patterns that adversaries can exploit.

Read the full paper on ArXiv.

‍

Continuous AI Red Teaming: Why S&P Global Says It’s Critical to Securing the AI Era

S&P Global Coverage Initiation: Mindgard’s continuous AI red teaming looks to secure models and applications

Mindgard Recognized as UK's Most Innovative Cyber SME 2024 at Infosecurity Europe

Compilation as a Defense

Enhancing DL Model Attack Robustness via Tensor Optimization

Mindgard, the leading provider of Artificial Intelligence security solutions, helps enterprises secure their AI models, agents, and systems across the entire lifecycle. Mindgard’s solution uncovers shadow AI, conducts automated AI red teaming by emulating adversaries, and delivers runtime protection against attacks like prompt injection and agentic manipulation. Trusted by leading organizations in finance, healthcare, and technology, Mindgard is backed by investors including .406 Ventures, IQ Capital, Atlantic Bridge, and Lakestar.