Model Leeching: An Extraction Attack Targeting LLMs

Updated on

December 19, 2024

Dr. Peter Garraghan

TABLE OF CONTENTS

Key Takeaways

Model Leeching is a novel extraction attack targeting Large Language Models (LLMs), capable of distilling task-specific knowledge from a target LLM into a reduced parameter model.

We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73% Exact Match (EM) similarity, and SQuAD EM and F1 accuracy scores of 75% and 87%, respectively for only $50 in API cost.

We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11% increase to attack success rate when applied to ChatGPT-3.5-Turbo.

Access the complete insights into Model Leeching.

Next Steps

Thank you for reading our research about Model Leeching!

Test Our Free Platform: Experience how our Automated Red Teaming platform swiftly identifies and remediates AI security vulnerabilities. Start for free today!
Follow Mindgard: Stay updated by following us on LinkedIn and X, or join our AI Security community on Discord.
Get in Touch: Have questions or want to explore collaboration opportunities? Reach out to us, and let's secure your AI together.
Please, feel free to request a demo to learn about the full benefits of Mindgard Enterprise.

Analyst Report: Gartner AI TRiSM Market Guide

Gartner's AI Trust, Risk, and Security Management (AI TRiSM) framework provides a structured approach to managing AI risks while maintaining transparency and accountability.

PINCH: An Adversarial Extraction Attack Framework for Deep Learning Models

PINCH is an efficient and automated extraction attack framework.

Talk: Deconstructing AI Risk — From Research to Real-World Exploits

In this talk, Peter Garraghan demonstrates how adversaries are already exploiting AI systems and why current security practices are often ill-equipped to stop them.