A cryptographic hash (sometimes called ‘digest’) is a kind of ‘signature’ for a text or a data file. SHA1 generates an almost-unique 160-bit (20-byte) signature for a text. See below for the source code.
A hash is not ‘encryption’ – it cannot be decrypted back to the original text (it is a ‘one-way’ cryptographic function, and is a fixed size for any size of source text). This makes it suitable when it is appropriate to compare ‘hashed’ versions of texts, as opposed to decrypting the text to obtain the original version. Such applications include stored passwords, hash tables, integrity verification, challenge handshake authentication, digital signatures, etc.
Note on passwords: it is no longer considered safe to use even salted sha-1 hashes to store passwords, largely because sha-1 hashing is designed to be efficient; with modern GPUs and rainbow lookup tables, (salted) hashed passwords can still be insecure. For password hashing, either bcrypt or scrypt is probably the preferred option (PHP now has a good bcrypt implementation with password_hash().
SHA-1 is one of the most secure hash algorithms. It is used in SSL (Secure Sockets Level), PGP (Pretty Good Privacy), XML Signatures, and in Microsoft’s Xbox; the git source-code management system uses sha-1 hashes extensively, and it is used in hundreds of other applications (including from IBM, Cisco, Nokia, etc). It is defined in the NIST (National Institute of Standards and Technology) standard ‘FIPS 180-4’. NIST also provide a number of test vectors to verify correctness of implementation. There is a good description at Wikipedia.
Note on security: SHA-1 was subjected to cryptanalysis through 2005 which showed it to be weaker than its theoretical strength. Cryptanalysis is complex (and I’m no expert), but Xiaoyun Wang effectively announced that given thousands of years of supercomputer time, a ‘collision pair’ could be found. Even this, however, would be unlikely to be exploited to compromise any real-life cryptographic hash (for which a ‘pre-image’ attack would be necessary). SHA1 is still extremely secure, for the moment. However, NIST made a recommendation that federal agencies should migrate to SHA-2 algorithms for most purposes by 2010.
This script is oriented toward hashing text messages rather than binary data. The standard considers hashing byte-stream (or bit-stream) messages only. Text which contains (multi-byte) characters outside ISO 8859-1 (i.e. accented characters outside Latin-1 or non-European character sets – anything with Unicode code-point above U+FF), can’t be encoded 4-per-word, so the script defaults to encoding the text as UTF-8 before hashing it.
Notes on the implementation of the preprocessing stage:
M[N-1] = ((msg.length-1)*8) >>> 32;
M[N-1] = ((msg.length-1)*8) & 0xffffffff;
Note that what is returned is the textual hexadecimal representation of the binary hash. This can be useful for instance for storing hashed passwords, but if you want to use the hash as a key to an encryption routine, for example, you will want to use the binary value not this textual representation.
Using Chrome on a low-to-middling Core i5 PC, this script will process the message at a speed of around 2 – 3 MB/sec. A single short message will be hashed in around 0.2 – 0.6 ms.
I offer these scripts for free use and adaptation to balance my debt to the open-source info-verse. You are welcome to re-use these scripts [under an MIT licence, without any warranty express or implied] provided solely that you retain my copyright notice and a link to this page.
If you would like to show your appreciation and support continued development of these scripts, I would most gratefully accept donations.
If you have any queries or find any problems, contact me at ku.oc.epyt-elbavom@cne-stpircs.
© 2002-2016 Chris Veness