Post Process

Everything to do with E-discovery & ESI

EDD Basics: What is a hash value (or hash code)?

Posted by rjbiii on November 19, 2007

An installment of our EDD Basics Series.

It has been referred as a “digital fingerprint” and compared to DNA. But what exactly is a hash value?

Briefly, and perhaps unhelpfully, a hash code is a “value,” in the form of a text string, that is calculated by a hash function. Basically, you take a bit of data, you chop it up and mix it all around, and you come up with a unique value. As long as it’s consistent (i.e., the same data is always identified with same hash code), and as long as it’s unique (mathematically unlikely to assign different sets of data the same hash code), then the method can be used, among other things, to identify identical files on an I.S. system, regardless of the name of the file.

In electronic discovery, the hash code is used to remove duplicate files from review or production. Removing duplicates from review reduces costs by allowing a reviewer to see and make a decision on a document once (and only once); the unseen duplicates are then marked in the same manner as the reviewed file. Removing duplicates from production can reduce the size (and therefore cost) of the production. Some care must be taken with respect to cross-referencing those removed identicals to the included original file. Furthermore, removing e-mail attachments can lead to confusion, and in some cases contention. E-mail messages often indicate the presence of attachments, and if those attachments are not present in a production, the requesting party will often point out that “documents are missing.”

The two most commonly used types of hash functions are MD5 and SHA-1. The MD5 is a 128 bit hash value, while SHA-1 is 160 bits. The MD5 is expressed as a 32 character hexadecimal number, while the SHA-1 is expressed as a hexadecimal number with 40 characters.

Advertisements

2 Responses to “EDD Basics: What is a hash value (or hash code)?”

  1. An example of a hash value might help your readers:
    51FEC3B6FCB1E7D5465575BED5DCDC1B8897AE5A

    Can you deduce what kind it is, MD5 or SHA-1?

    If you want to go deeper check out some of my articles on this, my favorite subject.
    http://ralphlosey.wordpress.com/computer-hash-5f0266c4c326b9a1ef9e39cb78c352dc/

    Thanks for bringing up this topic.

    Ralph

  2. rjbiii said

    Ralph is serious about hash codes. I believe the value in his comment is a SHA-1, because it’s 40 characters long. Am I right, Ralph?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: