A cryptographic hash (sometimes called ‘digest’) is a kind of ‘signature’ for a text or a data file. SHA1 generates an almost-unique 160-bit (20-byte) signature for a text. There is a good description on Wikipedia; see below for the source code.
A hash is not ‘encryption’ – it cannot be decrypted back to the original text (it is a ‘one-way’ cryptographic function, and is a fixed size for any size of source text). This makes it suitable when it is appropriate to compare ‘hashed’ versions of texts, as opposed to decrypting the text to obtain the original version.
Such applications include hash tables, integrity verification, challenge handshake authentication, digital signatures, etc.
Note that hash functions are not appropriate for storing encrypted passwords, as they are designed to be fast to compute, and hence would be candidates for brute-force attacks. Key derivation functions such as bcrypt or scrypt are designed to be slow to compute, and are more appropriate for password storage (npm has bcrypt and scrypt libraries, and PHP has a bcrypt implementation with password_hash).
SHA-1 is is no longer recommended for cryptographic purposes (SHA-256 or SHA-3 are now preferred). Google have now achieved a collision attack on SHA-1. While confirming the weakness of SHA-1 for cryptographic purposes, this attack has been over-reported, largely by people who fail to understand (or ignore) the difference between a collision attack and a pre-image attack (there is a large difference between creating a collision between two documents (containing large amounts of arbitrary binary data) and creating a collision with an existing document which someone else created). Nonetheless, the message is clear – switch to SHA-256 or SHA-3.
SHA-1 is defined in the NIST (National Institute of Standards and Technology) standard ‘FIPS 180-4’. NIST also provide a number of test vectors to verify correctness of implementation.
This script is oriented toward hashing text messages rather than binary data. The standard considers hashing byte-stream (or bit-stream) messages only. Text which contains (multi-byte) characters outside ISO 8859-1 (i.e. accented characters outside Latin-1 or non-European character sets – anything with Unicode code-point above U+FF), can’t be encoded 4-per-word, so the script defaults to encoding the text as UTF-8 before hashing it.
Notes on the implementation of the preprocessing stage:
M[N-1] = ((msg.length-1)*8) >>> 32;
M[N-1] = ((msg.length-1)*8) & 0xffffffff;
Note that what is returned is the textual hexadecimal representation of the binary hash. This can be useful for instance for storing hashed passwords, but if you want to use the hash as a key to an encryption routine, for example, you will want to use the binary value not this textual representation.
Using Chrome on a low-to-middling Core i5 PC, in timing tests this script will hash a short message in around 0.03 – 0.06 ms; longer messages will be hashed at a speed of around 2 – 3 MB/sec.
I have also developed an implementation of SHA-256, and also of SHA-512 and SHA-3 / Keccak.
Note that these scripts are intended to assist in studying the algorithms, not for production use. For production use, I would recommend the Web Cryptography API for the browser (see example), or the crypto library in Node.js. For password hashing, I have a WebCrypto example using PBKDF2.
I offer these scripts for free use and adaptation to balance my debt to the open-source info-verse. You are welcome to re-use these scripts [under an MIT licence, without any warranty express or implied] provided solely that you retain my copyright notice and a link to this page.
If you would like to show your appreciation and support continued development of these scripts, I would most gratefully accept donations.
If you have any queries or find any problems, contact me at ku.oc.epyt-elbavom@cne-stpircs.
© 2002-2017 Chris Veness