Cryptology ePrint Archive: Report 2016/246


Peder Sparell and Mikael Simovits

Abstract: In order to remember long passwords, it is not uncommon users are recommended to create a sentence which then is assembled to form a long password, a passphrase. However, theoretically a language is very limited and predictable, why a linguistically correct passphrase according to Shannon's definition of information theory should be relatively easy to crack compared to bruteforce. This work focuses on cracking linguistically correct passphrases, partly to determine to what extent it is advisable to base a password policy on such phrases for protection of data, and partly because today, widely available effective methods to crack passwords based on phrases are missing. Within this work, phrases were generated for further processing by available cracking applications, and the language of the phrases were modeled using a Markov process. In this process, phrases were built up by using the number of observed instances of subsequent characters or words in a source text, known as n-grams, to determine the possible/probable next character/word in the phrases. The work shows that by creating models of language, linguistically correct passphrases can be broken in a practical way compared to an exhaustive search. In the tests, passphrases consisting of up to 20 characters were broken.

Category / Keywords: Passphrases Cracking Markov chains

Date: received 4 Mar 2016, last revised 4 Mar 2016

Contact author: mikael at simovits com

Available format(s): PDF | BibTeX Citation

Note: Has been presented as a tutorial at PasswordCon 2015 UK and as a presentation at RSA Conference 2016 USA.

Version: 20160306:164616 (All versions of this report)

Short URL:

[ Cryptology ePrint archive ]