Paper 2025/643
Obfuscation for Deep Neural Networks against Model Extraction: Attack Taxonomy and Defense Optimization
Abstract
Well-trained deep neural networks (DNN), including large language models (LLM), are valuable intellectual property assets. To defend against model extraction attacks, one of the major ideas proposed in a large body of previous research is obfuscation: splitting the original DNN and storing the components separately. However, systematically analyzing the methods’ security against various attacks and optimizing the efficiency of defenses are still challenging. In this paper, We propose a taxonomy of model-based extraction attacks, which enables us to identify vulnerabilities of several existing obfuscation methods. We also propose an extremely efficient model obfuscation method called O2Splitter using trusted execution environment (TEE). The secrets we store in TEE have O(1)-size, i.e., independent of model size. Although O2Splitter relies on a pseudo-random function to provide a quantifiable guarantee for protection and noise compression, it does not need any complicated training or filtering of the weights. Our comprehensive experiments show that O2Splitter can mitigate norm-clipping and fine-tuning attacks. Even for small noise (ϵ = 50), the accuracy of the obfuscated model is close to random guess, and the tested attacks cannot extract a model with comparable accuracy. In addition, the empirical results also shed light on discovering the relation between DP parameters in obfuscation and the risks of concrete extraction attacks.
Metadata
- Available format(s)
-
PDF
- Category
- Applications
- Publication info
- Published elsewhere. Minor revision. ACNS 2025
- Keywords
- machine learning model securitymodel obfuscationtrusted execution environmentintellectual property protection
- Contact author(s)
-
yulian sun @ edu ruhr-uni-bochum de
vedant bonde1 @ huawei com
liduan @ mail upb de
yong li1 @ huawei com - History
- 2025-04-12: approved
- 2025-04-08: received
- See all versions
- Short URL
- https://ia.cr/2025/643
- License
-
CC BY-NC
BibTeX
@misc{cryptoeprint:2025/643, author = {Yulian Sun and Vedant Bonde and Li Duan and Yong Li}, title = {Obfuscation for Deep Neural Networks against Model Extraction: Attack Taxonomy and Defense Optimization}, howpublished = {Cryptology {ePrint} Archive, Paper 2025/643}, year = {2025}, url = {https://eprint.iacr.org/2025/643} }