Last active
October 4, 2024 14:00
-
-
Save timmc/df5fbb6e069fb8c1c4e181a29930ace3 to your computer and use it in GitHub Desktop.
Revisions
-
timmc revised this gist
May 27, 2020 . 2 changed files with 50 additions and 1 deletion.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,49 @@ #!/usr/bin/env python3 # Build a SQLite3 DB for looking up SHA-1 hashes of leaked passwords. # # This can be fed the txt file from one of Have I Been Pwned's hash # lists available from https://haveibeenpwned.com/Passwords -- but any # text file with line format ``hash-hex:count`` will work. # # When run on the v5 hash-ordered SHA-1 file, expect the build to take # about 35 minutes and produce a 15.7 GiB file (~30.5 bytes per record). # # This example shows querying the resulting database for the # vulnerable password "qwerty123", and finding that it was present # with a count of 621679: # # >>> import sqlite3 # >>> conn = sqlite3.connect("pwned-passwords-sha1-with-counts-v5.sqlite") # >>> hash_bytes = hashlib.sha1('qwerty123'.encode()).digest() # >>> conn.execute("SELECT * FROM hashes WHERE hash = :sha1 LIMIT 1", {'sha1': hash_bytes}).fetchone() # (b'\\\xec\x17[\x16^=^b\xc9\xe1<\xe8H\xefo\xea\xc8\x1b\xff', 621679) import os import sqlite3 import sys def record_generator(in_path): with open(in_path) as hashes: for line in hashes: (sha1_hex, count_str) = line.split(':', 2) sha1_bytes = bytes.fromhex(sha1_hex) count = int(count_str) yield (sha1_bytes, count) def build(in_path, out_path): with sqlite3.connect(out_path) as conn: conn.execute('pragma journal_mode=memory') conn.execute('CREATE TABLE hashes("hash" BLOB PRIMARY KEY, "count" INT) WITHOUT ROWID') conn.executemany( 'INSERT INTO hashes(hash, count) VALUES (?, ?)', record_generator(in_path)) conn.commit() def main(*args): if len(args) != 2: print("Usage: build.py <hash-and-count-input.txt> <output.sqlite>") sys.exit(1) build(*args) if __name__ == '__main__': main(*sys.argv[1:]) This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,4 +1,4 @@ **Update:** Added a Python version (`pwned-passwords-sqlite-build.py`) that preserves counts and uses binary keys rather than text. Last executed 2019-06-25 with the v4 dump: -
timmc renamed this gist
May 27, 2020 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
timmc revised this gist
Mar 2, 2020 . 1 changed file with 2 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -1,3 +1,5 @@ (There are better ways to do this, such as making a small Python program to load in the data as binary instead of hex—but this is what I did in a pinch and it worked well for what I needed!) Last executed 2019-06-25 with the v4 dump: 1. Make sure you have 60 GB free disk space and some extra to -
timmc renamed this gist
Aug 9, 2019 . 1 changed file with 0 additions and 0 deletions.There are no files selected for viewing
File renamed without changes. -
timmc created this gist
Jun 25, 2019 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,38 @@ Last executed 2019-06-25 with the v4 dump: 1. Make sure you have 60 GB free disk space and some extra to spare. Alternatively, take a walk on the wild side and delete source files as soon as you've used them. 2. Download the SHA-1 (ordered by hash) torrent from https://haveibeenpwned.com/Passwords 3. Unpack and strip off the counts: ``` 7z x -so pwned-passwords-sha1-ordered-by-hash-v4.7z pwned-passwords-sha1-ordered-by-hash-v4.txt | sed 's/:.*//' > hashes.lst ``` 4. Load into sqlite: ``` $ sqlite3 pwned-passwords-by-hash-v4.sqlite sqlite> pragma journal_mode=memory; sqlite> CREATE TABLE hashes("hash" BLOB PRIMARY KEY) WITHOUT ROWID; sqlite> .mode csv sqlite> .import hashes.lst hashes ``` 5. Confirm that queries will use the primary key's index: ``` sqlite> EXPLAIN QUERY PLAN SELECT * FROM hashes WHERE hash = "D657187D9C9C1AD04FDA5132338D405FDB112FA1" LIMIT 1; 0|0|0|SEARCH TABLE hashes USING PRIMARY KEY (hash=?) ``` A sample query should return almost instantly: ``` sqlite> SELECT * FROM hashes WHERE hash = "D657187D9C9C1AD04FDA5132338D405FDB112FA1" LIMIT 1; D657187D9C9C1AD04FDA5132338D405FDB112FA1 ``` A few notes on construction: - Journaling is disabled to speed up initial import. - Making the hash the primary key tells sqlite that the data is unique and ordered—and the primary key data is stored in a B-tree, not duplicated into an additional index. - We can also avoid storing rowids; no idea how much space this saves us, but no harm here: https://www.sqlite.org/withoutrowid.html