Over the last couple of weeks I've built a pass-pwned extension for pass that allows you to check if your passwords have been compromised in a security breach.
This uses the Pwned Passwords API and builds on the pwnedpasswords.sh bash client that I built.
This extension can work with the online API and a downloaded password hash file. Regardless of whether the extension is working in online or offline mode it's capable of checking a password in a fraction of a section, even with a file that's 10s of gigabytes in size.
Keeping my password hashes private
The Pwned Paswords API allows you to send the prefix of your password hash to the API, the API then returns all breached passwords hashes starting with the provided hash prefix. This approach is known more widely as a k-anonymity model.
I really want to keep my password hashes completely private. Whilst the extension incorporates the pwnedpasswords.sh bash client I built for the online API, I still wanted the option to check my passwords offline.
Troy Hunt provides a 9GB (compressed) download of the password hashes that his API service uses:
-rw-rw-r-- 1 james james 30G Feb 17 04:13 pwned-passwords-ordered-2.0.txt -rw-rw-r-- 1 james james 9.0G Mar 4 13:28 pwned-passwords-ordered-2.0.txt.7z
Uncompressed, that's 30GB, and over 500 million lines of text...
Searching 30GB of data in ~24 milliseconds
My main challenge was to find a way to search 30GB of data quickly and efficiently. I certainly couldn't use anything like
grep on a dataset of this size unless I was prepared to wait a very long time.
The first contender for this was to use
look - a utility that will display lines beginning with a given string.
look -b 21BD1 pwned-passwords-ordered-2.0.txt look: pwned-passwords-ordered-2.0.txt: File too large
look couldn't cope with a file that large. It has the right appraoch of displaying lines in the file where the beginning matched the search key and using binary search to make this more efficient (naturally, this does require the file to be sorted).
I started writing bsearch, an application that would use binary search to print out lines that started with the search key. My goal was to overcome the file size limitations of
On average, bsearch is able to find a match in 24.267 milliseconds. This average was calculated used the 30GB pwned passwords data file and searching using a full hash. This average was produced from 1,000 individual bsearch lookups.
The final result
pass-pwned is an exntension for pass that supports checking passwords against either the Pwned Passwords API or an offline password hash file.
Along the way I had to write bsearch so that I could efficiently find lines in a large file that started with a given search key.
bsearch can handle files that are sorting in ascending or descending order and can use a case-sensitive, case-insensitive or numeric comparator. The tool is available to install as a linux binary or a debian package.