Paper 2023/804

Falkor: Federated Learning Secure Aggregation Powered by AES-CTR GPU Implementation

Mariya Georgieva Belorgey, Inpher
Sofia Dandjee, Netlight
Nicolas Gama, SandboxAQ
Dimitar Jetchev, Inpher
Dmitry Mikushin, University of Lausanne - UNIL

We propose a novel protocol, Falkor, for secure aggregation for Federated Learning in the multi-server scenario based on masking of local models via a stream cipher based on AES in counter mode and accelerated by GPUs running on the aggregating servers. The protocol is resilient to client dropout and has reduced clients/servers communication cost by a factor equal to the number of aggregating servers (compared to the naïve baseline method). It scales simultaneously in the two major complexity aspects: 1) large number of clients; 2) highly complex machine learning models such as CNNs, RNNs, Transformers, etc. The AES-CTR-based masking function in our aggregation protocol is built on the concept of counter-based cryptographically-secure pseudorandom number generators (csPRNGs) as described in [SMDS'11] and subsequently used by Facebook for their torchcsprng csPRNG. We improve upon torchcsprng by careful use of shared memory on the GPU device, a recent idea of Cihangir Tezcan [Tezcan'21] and obtain 100x speedup in the masking function compared to a single CPU core. In addition, we prove the semantic security of the AES-CTR-based masking function. Finally, we demonstrate scalability of our protocol in two real-world Federated Learning scenarios: 1) efficient training of large logistic regression models with 50 features and 50M data points distributed across 1000 clients that can dropout and securely aggregated via three servers (running secure multi-party computation (SMPC)); 2) training a recurrent neural network (RNN) model for sentiment analysis of Twitter feeds coming from a large number of Twitter users (more than 250,000 users). In case 1), our secure aggregation algorithm runs in less than a minute compared to a pure MPC computation (on 3 parties) that takes 27 hours and uses 400GB RAM machines as well as 1 gigabit-per-second network. In case 2), the total training is around $10$ minutes using our GPU powered secure aggregation versus 10 hours using a single CPU core.

Available format(s)
Cryptographic protocols
Publication info
Federated LearningSecure AggregationMPCGPU Optimizations
Contact author(s)
maria georgievabs @ gmail com
sofiadandjee11 @ gmail com
nicolas gama @ gmail com
dimitar @ inpher io
dmitry mikushin @ unil ch
2023-06-06: approved
2023-06-01: received
See all versions
Short URL
Creative Commons Attribution


      author = {Mariya Georgieva Belorgey and Sofia Dandjee and Nicolas Gama and Dimitar Jetchev and Dmitry Mikushin},
      title = {Falkor: Federated Learning Secure Aggregation Powered by {AES}-{CTR} {GPU} Implementation},
      howpublished = {Cryptology ePrint Archive, Paper 2023/804},
      year = {2023},
      note = {\url{}},
      url = {}
Note: In order to protect the privacy of readers, does not use cookies or embedded third party content.