Cryptology ePrint Archive: Report 2014/631

Zipf’s Law in Passwords

Ding Wang, Gaopeng Jian, Xinyi Huang, Ping Wang

Abstract: Despite more than thirty years of research efforts, textual passwords are still enveloped in mysterious veils. In this work, we make a substantial step forward in understanding the distributions of passwords and measuring the strength of password datasets by using a statistical approach. We first show that Zipf's law perfectly exists in real-life passwords by conducting linear regressions on a corpus of 56 million passwords. As one specific application of this observation, we propose the number of unique passwords used in regression and the slope of the regression line together as a metric for assessing the strength of password datasets, and prove it in a mathematically rigorous manner. Furthermore, extensive experiments (including optimal attacks, simulated optimal attacks and state-of-the-art cracking sessions) are performed to demonstrate the practical effectiveness of our metric. To the best of knowledge, our new metric is the first one that is both easy to approximate and accurate to facilitate comparisons, providing a useful tool for the system administrators to gain a precise grasp of the strength of their password datasets and to adjust the password policies more reasonably.

Category / Keywords: foundations / User authentication, Password cracking, Statistics

Original Publication (with major differences): IEEE Transactions on Information Forensics and Security

Date: received 16 Aug 2014, last revised 2 Jan 2018

Contact author: pwang at pku edu cn

Available format(s): PDF | BibTeX Citation

Version: 20180103:040548 (All versions of this report)

Short URL:

[ Cryptology ePrint archive ]