Obfuscation for Deep Neural Networks against Model Extraction: Attack Taxonomy and Defense Optimization

Yulian Sun; Vedant Bonde; Li Duan; Yong Li

Paper 2025/643

Obfuscation for Deep Neural Networks against Model Extraction: Attack Taxonomy and Defense Optimization

Yulian Sun

, Ruhr University Bochum

Vedant Bonde

, Huawei Technologies (Germany)

Li Duan

, Paderborn University

Yong Li

, Huawei Technologies (Germany)

Abstract

Well-trained deep neural networks (DNN), including large language models (LLM), are valuable intellectual property assets. To defend against model extraction attacks, one of the major ideas proposed in a large body of previous research is obfuscation: splitting the original DNN and storing the components separately. However, systematically analyzing the methods’ security against various attacks and optimizing the efficiency of defenses are still challenging. In this paper, We propose a taxonomy of model-based extraction attacks, which enables us to identify vulnerabilities of several existing obfuscation methods. We also propose an extremely efficient model obfuscation method called O2Splitter using trusted execution environment (TEE). The secrets we store in TEE have O(1)-size, i.e., independent of model size. Although O2Splitter relies on a pseudo-random function to provide a quantifiable guarantee for protection and noise compression, it does not need any complicated training or filtering of the weights. Our comprehensive experiments show that O2Splitter can mitigate norm-clipping and fine-tuning attacks. Even for small noise (ϵ = 50), the accuracy of the obfuscated model is close to random guess, and the tested attacks cannot extract a model with comparable accuracy. In addition, the empirical results also shed light on discovering the relation between DP parameters in obfuscation and the risks of concrete extraction attacks.

Metadata

Available format(s): PDF
Category: Applications
Publication info: Published elsewhere. Minor revision. ACNS 2025
Keywords: machine learning model security model obfuscation trusted execution environment intellectual property protection
Contact author(s): yulian sun @ edu ruhr-uni-bochum de
vedant bonde1 @ huawei com
liduan @ mail upb de
yong li1 @ huawei com
History: 2025-04-12: approved; 2025-04-08: received; See all versions
Short URL: https://ia.cr/2025/643
License: CC BY-NC

BibTeX

@misc{cryptoeprint:2025/643,
      author = {Yulian Sun and Vedant Bonde and Li Duan and Yong Li},
      title = {Obfuscation for Deep Neural Networks against Model Extraction: Attack Taxonomy and Defense Optimization},
      howpublished = {Cryptology {ePrint} Archive, Paper 2025/643},
      year = {2025},
      url = {https://eprint.iacr.org/2025/643}
}