Today Post Process introduces a new series of posts, called “E-Discovery Pitfalls,” that will describe cases in which the e-discovery process went wrong. Today’s installment discusses a case of missing attachments, and the matter is PSEG Power New York, Inc. v. Alberici Constructors, Inc., Slip Copy 1:05-cv-00657-DNH-RFT (N.D.N.Y., Sept. 7, 2007) [pdf file]. A summary can be viewed here.
Law.com has an article which can quickly bring us to the crux:
PSEG Power New York Inc., turned over more than 3,000 e-mails and 211,000 pages of documents to a legal adversary, but a magistrate judge has found that the company still failed to comply with a discovery request.
Magistrate Judge Randolph F. Treece directed PSEG to try again to produce the materials in a coherent form as requested by Alberici Constructors Inc., its adversary in a suit in the Northern District of New York. PSEG must do so at its own expense, despite a plea from the energy company to shift some of the e-discovery costs to Alberici.
The opinion in this large construction-based action opens with the magistrate expressing his understanding of the difficulties posed in this case by electronic discovery:
For nearly six months, the parties and the Court have been grappling with an electronic discovery monstrosity with the hope that it could be corralled and definitively resolved, thereby obviating the need for motion practice. Alas, attempts to resolve the issue in lieu of briefs fell woefully beyond the parties’ grasp and, as the last straw, they have set the matter at our feet for appropriate resolution.
PSEG responded to Alberici’s first document request by producing, over a period of six months, 211,000 pages of hard copy, which Alberici then converted to electronic format (TIF images), and loaded into a litigation database at its own expense. PSEG also produced a disc of e-mails. Alberici had also produced a large amount of document to PSEG. At this point, all is straightforward. PSEG and Alberici each produced their own documents, at their own expense. Each paid for post-production conversion of the other party’s documents as well. Here’s where the tale goes astray. Judge Treece describes the problem:
In January 2007, it became evident to Alberici that PSEG had produced emails without the attachments which were referenced as being a part of the emails. Apparently a technical glitch occurred whereby numerous emails were “divorced” from their attachments caused by limitations in the downloading software. Dkt. No. 51 at p. 1, Ex. F, Pl.’s Lt., dated Feb. 20, 2007. The separation of the emails from the attachments happened at the interface between the different software used by PSEG and the vendor when reducing the documents in a form that could be reviewed by counsel. Id. It appears that the “vendor’s software was not compatible with the HTML format in which PSEG had provided its documents and that this incompatibility had resulted in the parent child link between the emails and attachments being broken.”
We’ve lost the connection between our e-mail messages and their attachments. This means that in responding to discovery requests by Alberici, PSEG failed to produce relevant material. The good news is that the required data exists, and this isn’t a case of evidence destruction. The bad news is that quite a bit of technical expertise will now have to be put to use in order to either 1) re-process the e-mails in a manner that preserves the connection to attachments; or 2) use the existing evidence to (in the court’s words) “re-marry” the two.
Judge Treece tells us what the parties did upon discovering the problem:
Throughout this ordeal, the raw data was not lost. All 750 gigabytes of unfiltered data remained intact in its original format. Dkt. Nos. 54, n.5; 57 at p. 4. Realizing that the underlying data still existed, the next proposal included PSEG sharing with Alberici’s vendor a sample of the metadata for analysis. However, the dearth of metadata related to the emails and attachments rendered this proposal fruitless. Id. In the interim, the parties’ vendors explored other ways to reverse engineer the available data and “re-marry” the attachments to their emails. This exploration was for naught inasmuch as the data necessary to complete this task was destroyed during PSEG’s collection and formatting of the emails. Id.
So, although the “raw data” is still intact, the data resulting from downstream processing is in such a state so as to render the re-connecting of parent email to child attachment impossible. At this stage, I’d like to point out something very important. Changes to data have occurred in the EDD processing (or loading, or ingestion, or whatever you want to call it) phase, rendering that data unuseable for the task at hand. However, there is a reservoir of “raw data” available because of (what one supposes is) correctly collected data. This illustrates the vital importance of a defensible collection process. The opinion here gives no description of how data was collected, because the collection isn’t being challenged. Nevertheless, without that defensible process in place, there would be no acceptable “raw data” reservoir as backup, and the issue might not be late production of attachments in an appropriate form, but destruction of data, a much more serious affair.
After the parties’ attempts to collaborate on solving the problem of the missing attachments failed to produce an accord, the magistrate got involved:
[T]he issues are several-fold: (1) is Alberici entitled to receive the emails with the related attachments together as opposed to their current state of separation, lacking coordinated identification with each other; (2) although PSEG has provided these emails and attachments in hard copy albeit not “married,” is PSEG obligated to provide these documents in their original format; and (3) if re-production is required, which party bears the cost of this production?
Alberici felt as if the e-mails and attachments should be connected in some easily identifiable way. PSEG argued that Alberici “is impermissibly seeking a “perfect” or “ideal” production, regardless of expense or benefit.” Furthermore, “[s]uch a re-do effort would be duplicative and entirely unnecessary in its view.”
PSEG offered its own alternative to solving the problem:
PSEG wants Alberici to identify a concise group of attachments that are important and necessary to Alberici and then it would consider producing said attachments, however, reserving its right to assert that it may be irrelevant or non-responsive or privileged. Dkt. No. 54 at p. 4. Or, if Alberici insists on a re-production, PSEG is willing to provide them but at Alberici’s expense. Id.
So…we’ve already produced the stuff, although not in an optimum state. If you want it produced yet again, you pay for it. How did that go over with his honor? He started off by mentioning the changes to FRCP 34(b):
(ii) if a request does not specify the form or forms for producing electronically stored information, a responding party must produce the information in a form or forms in which it is ordinarily maintained or in a form or forms that are reasonably usable; and (iii) a party need not produce the same electronically stored information in more than one form.
Obviously, the commonsensical purpose of this mandate has always been to prevent massive dumping of documents, without form or direction, thereby
alleviating an incalculable burden upon the requesting party of searching for the proverbial needle in a haystack. In this respect, notwithstanding Rule 34(b)’s amendments, (ii) and (iii), PSEG would still have to produce business records as kept in the regular course of business or in such a manner that Alberici could readily find a necessary document or two. It has also been a seminal rule that the responding party would not have to satisfy the requesting party’s whim to have the documents produced in various forms.
The Judge then cuts to the chase:
Clearly these 3000 emails and related attachments were not produced in accordance with this mandate, and, as we now know have caused considerable consternation and agony for both parties, which the revised statute was attempting to avoid. Normally, one would expect that an email and its attachment would have been kept together in the regular course of business, and the production of said documents would have followed suit. Here, the difficulty has been that there was not sufficient identifying information to match attachments with their respective email. We accept Alberici’s proffer that it has spent considerable time employing different methodologies to unearth attachments to correspond with the emails it has found to be pertinent. Attempting to reunite these documents has been nothing short of a donnybrook for Alberici. It has been frustrated if not completely hamstrung in locating these documents. Compounding Alberici’s angst is the disadvantage it has been placed in preparing for depositions. In essence, the first production of emails and attachments has been ineffectual.
The judge then takes a shot at PSEG’s vendor:
We acknowledge that discovery production is rarely perfect or ideal, yet this discovery quagmire created by PSEG’s vendor falls woefully short of comporting with the spirit of Rule 34.
Ultimately the Judge decided that PSEG must take responsibility for the consequences of its own vendor’s misadventures, by denying the request for cost-shifting, and granting Alberici’s motion to compel.
In one project in which I worked, we noticed missing attachments for some e-mails, but not for others. This is different than what happened with PSEG, in that the attachments that the e-mail message indicating should be present didn’t exist. After quite a bit of investigation, we discovered an error in collection, which had been executed by the client’s own IT staff. Remote users (with laptops) would receive e-mails logged in over the web, but woudn’t actually download the attachments unless they requested the document (by clicking on it). So if they never viewed the attachment, it didn’t exist on their system, despite the e-mail’s indication that the attachment should be present. As a result, the data had to be collected yet again. Additionally, some rather extensive data manipulation had to be done to get the originally processed data and the re-collected data to match. Never was very pretty, but at least the issue was discovered before production.