Data Poisoning in Machine Learning: 6 Ways Attackers Manipulate Your Models

Data poisoning attacks can manipulate machine learning models in multiple ways, making continuous testing, monitoring, and data validation essential to preventing biased, unreliable, or compromised AI systems.

In This Article

    If machine learning models are going to produce anything useful, they first need good data to train against. But that isn’t the only consideration: models should ideally be as unbiased as possible and properly calibrated for how they will be used. But before you even get to these steps, there’s a threat that can compromise your machine learning models from the ground up: data poisoning.

    Data poisoning attacks occur when an adversary corrupts the information a model learns from, whether during training or elsewhere in data pipelines that model depends on. These attacks can have effects from subtle bias to outright failure. As with most attacks, there are many variations. Here are six.

    1. Label Flipping

    Label flipping attacks don't poison your data instances themselves. Instead, they poison your labels (what your model thinks is ground truth) by simply changing correct labels to incorrect ones. Label flipping leaves the data behind it untouched, which can make these attacks more difficult to catch during routine data review.

    Small attacks can still work. Label poisoning need only poison a few labels; if they choose wisely which classes they poison, they can shift the decision boundary in the direction of their adversary's objectives. Some classes will be more leveraged than others: edge classes will typically offer the biggest movement for the least effort. Studies have shown that poisoning leads to a 4-6x increase in classification error on well-known benchmark datasets such as MNIST and Spambase.

    Defensively, treat your labels as you would any other piece of data that your model could potentially consume. Doing label cross-checking, change auditing, and anomaly detection can help you validate label accuracy. Many security-minded teams will also perform label red teaming, attacking their own labels with adversary tactics to identify weaknesses.

    2. Clean Label Attacks

    Clean-label attacks are another variation worth mentioning. Rather than reversing the labels, an attacker poisons your data but leaves the labels accurate. They identify the data that will cause incorrect learning and send that to you under the guise of legitimate training data.

    Why is this attack more difficult to identify? Because it will not activate certain defenses such as automatic label checks. You’ll need to look for anomalous behavior in overall data distribution.

    3. Data Modification, Injection, or Deletion

    Developer reviewing source code to identify data poisoning in machine learning vulnerabilities
    Photo by Charles DeLuvio from Unsplash

    In addition to attacking labels directly, attackers may try to poison your data directly. This could include:

    • Modification: Changing current training data to have small errors.
    • Injection: Adding new training data to change the model behavior.
    • Deletion: Removing sensitive training data so that the model never learns it.

    Just like with label changes, the attacker does not need access to all your data. Research has found that an attacker can introduce just 0.1% of malicious training data to poison your model. Those poisoned models can have major downstream consequences. One paper from 2025 found that data poisoning decreased accuracy by up to 27% on image recognition models and 22% on fraud detection models.

    Deletion attacks can be even easier to perform (but perhaps more difficult to detect) because the model will simply never learn about that information.

    Remember that poisoning attacks can occur during training and when you retrain your model on user-generated data after it has been deployed. If this is a concern for your use case, you may want to add tighter controls and monitoring around your data ingestion pipelines. Be sure to separate clearly between user-input data and your training corpus.

    4. Backdoor Attacks

    A backdoor attack is when an adversary poisons your training data with covert triggers. Your model behaves as expected until one of your inputs contains a trigger. When this happens, the model produces a substitute (potentially malicious) output.

    Triggers can be arbitrary depending on the application. In vision models, they can be pixel-level watermarks imperceptible to humans. In audio, they can be frequencies too high for us to hear. In text, they can be specific words or formats. The unifying theme between these attacks is that they don’t affect your model’s performance on clean inputs: backdoored models behave normally on validation data.

    To understand how severe of a problem this poses, researchers have demonstrated backdoor attacks with success rates upwards of 80% while poisoning less than 0.2% of training data. Other attacks have achieved success rates of over 81% with only 0.1% poisoned training data in semi-supervised scenarios.

    To give a more specific example in healthcare: In a 2024 paper published in Nature Medicine, researchers demonstrated that by simply replacing 0.001% of training tokens with medical misinformation, they could cause models to generate unsafe completions at significantly increased rates while maintaining the same benchmarking performance as a clean model.

    Backdoors are some of the hardest poisoning attacks to detect for this reason. Running standard validation isn’t enough. As Dr. Peter Garraghan, CEO of Mindgard, warns

    “Many organizations underestimate AI risk by applying legacy testing assumptions. Asking a model a series of harmful questions and observing refusals is not equivalent to red teaming. It does not reflect how real adversaries operate, nor does it account for indirect or multi-step exploitation.”

    Instead, try thinking like an attacker yourself. Do extensive adversarial testing using unusual inputs with low probabilities that should theoretically have massive effects.

    5. Availability Attacks (Model Degradation)

    Engineer analyzing AI model behavior to detect data poisoning in machine learning and improve model security
    Photo by Kelly Sikkema from Unsplash

    Targeted attacks want you to make specific mistakes, while availability attacks want your model to fail at everything. Poisoning training data in an availability attack will cause the model to learn incorrect relationships. Gradual rot like these small mistakes accumulate over time: slowed model drift, higher false positives/negatives, or consistently flipping predictions on similar inputs.

    Availability attacks can be hard to catch because they insidiously erode your users’ trust without obviously breaking things. And they’re only getting more common. One study in 2025 found 26% of US and UK organizations had suffered an AI data poisoning attack that year, a significantly higher number than past years.

    Your best defense is to constantly be checking. Benchmarking your model regularly on clean held-out data, and keeping pristine audit logs will help you spot problems. Running these tests ahead of time with automated red teaming will help find them before production.

    6. Supply Chain Poisoning

    Attacks don’t necessarily come from within your own infrastructure. Supply chain poisoning occurs when an attacker targets something that your model relies on externally: open datasets, pre-trained models, third-party vendor solutions, etc. Michael Lieberman, CTO and co-founder of Kusari, warns

    “Most organizations are not training their own models. Instead, they rely on pre-trained models, often available for free. The lack of transparency regarding the origins of these models makes it easy for malicious actors to introduce harmful ones.”

    There is an inherent structural vulnerability at play. The numbers don’t lie. ReversingLabs’ 2026 Software Supply Chain Security Report documented a 73% rise in malicious open source package detections in 2025. And attackers are now targeting AI development pipelines themselves. According to other research, 15-25% of scraped web datasets include low-quality or unverifiable data. This has a direct impact on poisoning exposure for models trained on them.

    This problem is baked into the ecosystem. The heavier ML teams rely on third parties to fast-track their processes, the more vulnerability is created. The pretrained model you pulled from GitHub could have arrived with a built-in backdoor. The open dataset you’re about to download and use may have been poisoned under the cover of darkness before it even reached your hands.

    That isn’t to say you shouldn’t use third-party tools or open resources. You just can’t trust them blindly. Vet your vendors carefully. Sandbox new datasets before introducing them to your training corpus. Maintain rigorous provenance records of your data and models.

    Don’t Let Your Data Be the Weakest Link

    Why do data poisoning attacks work? Because our systems operate on trust. We trust that our training data will be good data. We trust our pipelines. We trust outside sources to send us clean data.

    Attackers know that. Which is why they poison our systems.

    Because of this, trusting your metrics won’t keep you safe. You also need to break your models on purpose and test how they react if that trust is abused. Red teaming, continuous monitoring, and adversarial testing let you attack your own systems and pressure-test pipelines to reveal vulnerabilities.

    Proactively test your model’s defenses against realistic attacks. Run customized adversarial tests with Mindgard to uncover data poisoning vulnerabilities before attackers do.

    Frequently Asked Questions

    How little poisoned data do you need to break a model?

    Not much. Poisoning with less than 0.1% malicious data has been shown to be enough to fool models when attacking features with large effects, specific classes, or edge cases.

    Can poisoning occur post-deployment?

    Yes. Poisoning is typically thought of as occurring before or during training, but if your model retrains on production or user submitted data, there is always a window for manipulation. Post deployment pipelines should be examined just as closely.

    How often should we run tests for data poisoning vulnerabilities?

    Because your data poisoning risks will evolve, you should test continuously (or at least during regular model monitoring cycles). Your data, models, and dependencies will change over time, creating new attack surfaces.

    How do availability attacks differ from integrity attacks?

    The terms availability attacks and integrity attacks have been used in the security literature to denote two general attack objectives. Availability attacks lead to reduced performance of the model as a whole. Integrity attacks cause the model to misclassify particular target inputs (typically while correctly classifying other inputs). Backdoor attacks are an example of an integrity attack.

    Get Your Free AI Risk Management Checklist

    The expert-level checklist for operationalizing NIST AI RMF, ISO/IEC 42001 and the EU AI Act. 190+ interactive items and a board-ready maturity scorecard. Built for CISOs, AI governance leads and ML engineering teams.