Post Process

Everything to do with E-discovery & ESI

Archive for the ‘Hash Values’ Category

EDD Basics: What is a hash value (or hash code)?

Posted by rjbiii on November 19, 2007

An installment of our EDD Basics Series.

It has been referred as a “digital fingerprint” and compared to DNA. But what exactly is a hash value?

Briefly, and perhaps unhelpfully, a hash code is a “value,” in the form of a text string, that is calculated by a hash function. Basically, you take a bit of data, you chop it up and mix it all around, and you come up with a unique value. As long as it’s consistent (i.e., the same data is always identified with same hash code), and as long as it’s unique (mathematically unlikely to assign different sets of data the same hash code), then the method can be used, among other things, to identify identical files on an I.S. system, regardless of the name of the file.

In electronic discovery, the hash code is used to remove duplicate files from review or production. Removing duplicates from review reduces costs by allowing a reviewer to see and make a decision on a document once (and only once); the unseen duplicates are then marked in the same manner as the reviewed file. Removing duplicates from production can reduce the size (and therefore cost) of the production. Some care must be taken with respect to cross-referencing those removed identicals to the included original file. Furthermore, removing e-mail attachments can lead to confusion, and in some cases contention. E-mail messages often indicate the presence of attachments, and if those attachments are not present in a production, the requesting party will often point out that “documents are missing.”

The two most commonly used types of hash functions are MD5 and SHA-1. The MD5 is a 128 bit hash value, while SHA-1 is 160 bits. The MD5 is expressed as a 32 character hexadecimal number, while the SHA-1 is expressed as a hexadecimal number with 40 characters.

Posted in EDD Basics, Hash Values | 2 Comments »

Case Blurb: Lorraine; Use of Hash Values to Authenticate ESI under 901(b)(4)

Posted by rjbiii on September 17, 2007

One method of authenticating electronic evidence under Rule 901(b)(4) is the use of “hash values” or “hash marks” when making documents.

  • DEFINITION: A unique numerical identifier that can be assigned to a file, a group of files, or a portion of a file, based on a standard mathematical algorithm applied to the characteristics of the data set. The most commonly used algorithms, known as MD5 and SHA, will generate numerical values so distinctive that the chance that any two data sets will have the same hash value, no matter how similar they appear, is less than one in one billion. ‘Hashing’ is used to guarantee the authenticity of an original data set and can be used as a digital equivalent of the Bates stamp used in paper document production.
  • Hash values can be inserted into original electronic documents when they are created to provide them with distinctive characteristics that will permit their authentication under Rule 901(b)(4). Also, they can be used during discovery of electronic records to create a form of electronic “Bates stamp” that will help establish the document as electronic.
  • A party that seeks to introduce its own electronic records may have just as much difficulty authenticating them as one that attempts to introduce the electronic records of an adversary.
    • Because it is so common for multiple versions of electronic documents to exist, it sometimes is difficult to establish that the version that is offered into evidence is the “final” or legally operative version.
    • This can plague a party seeking to introduce a favorable version of its own electronic records, when the adverse party objects that it is not the legally operative version, given the production in discovery of multiple versions.
    • Use of hash values when creating the “final” or “legally operative” version of an electronic record can insert distinctive characteristics into it that allow its authentication under Rule 901(b)(4).

Lorraine v. Markel Amer. Ins. Co., 241 F.R.D. 534 (D. Md. 2007) (citations omitted).

Posted in 3d Circuit, Admissibility of ESI, Authentication, Blogroll, Case Blurbs, D. Md., FRE 901(b)(4), Hash Values, Magistrate Judge Paul W. Grimm | Leave a Comment »