Post Process

Everything to do with E-discovery & ESI

Archive for June 24th, 2008

Mirror, Mirror, on the wall…

Posted by rjbiii on June 24, 2008

Wired has a set of twin articles out addressing life in the age of the Petabyte. A Petabyte is 1024 Terabytes, which is 1024 Gigabytes.

The internet came into being as a tool to enhance communication and collaboration. And it has. But it has also changed the behavior of its users, and many of those actions are now logged and stored. Combine that with a growing array of tools that record and store representations of human activity (CCTV, PDA’s etc…) and you can see that more than ever, there is a growing mass of data logging the details of individual, group, and global behavior. In the vernacular of copyright, records of our actions are now, more than ever, “fixed in a tangible medium,” and available for all sorts of purposes that wouldn’t have been possible in even the recent past.

Wired’s article on the “Petabyte Age” has a chart illustrating the differences in size between a Terabyte, with various data points in between. To cut to the chase, a Terabyte is viewed as a $200 hard drive that holds 260,000 songs, while a Petabyte is the total information “processed by Googles servers every 72 minutes.” If you have a moment, click on the above link…there are some interesting data points noted.

While the first article is interesting, the second article, The End of Theory, brings home the cogent point. The article proposes that the availability of statistics based on real behavior, rather than on imperfect models, will transform science:

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.

In the words of the article, More is not just more, “more is different.” Google research director, Peter Norvig, is quoted as saying that “All models are wrong, and increasingly you can succeed without them.” The world is becoming a great big database.

Or perhaps just a series of smaller databases. The question we face in e-discovery concerns the rapid identification, organization, and cataloging of disparate types of data. In light of discussion over the effectiveness of search terms by Judge Grimm in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D.Md. May 29, 2008 ), it seems certain that judicial scrutiny of search criteria formulation, and the objections to that formulation from the opposition will only increase.

Wired describes our brave new word as one in which:

[] massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

The use of these statistics in the legal arena will be something to follow closely. Could they be used to measure “community standards,” for example? Lawrence Walters, a defense attorney in a Florida obscenity case, argues that they can be used to help clarify what in the past was purely subjective. A New York Times piece has the details:

In a novel approach, the defense in an obscenity trial in Florida plans to use publicly accessible Google search data to try to persuade jurors that their neighbors have broader interests than they might have thought.

In the trial of a pornographic Web site operator, the defense plans to show that residents of Pensacola are more likely to use Google to search for terms like “orgy” than for “apple pie” or “watermelon.” The publicly accessible data is vague in that it does not specify how many people are searching for the terms, just their relative popularity over time. But the defense lawyer, Lawrence Walters, is arguing that the evidence is sufficient to demonstrate that interest in the sexual subjects exceeds that of more mainstream topics — and that by extension, the sexual material distributed by his client is not outside the norm.

In the movie, The Neverending Story, the hero (named Atreyu) is forced to view his reflection in a mirror that reveals to him “who he really is,” stripped of all flattering notions. The Age of the Petabyte gives us a mirror, of a sort, to look into and reveal things about ourselves that we might have otherwise disputed. Mr. Walters is trying to use that mirror, and hold it up for the Florida jurors. Whether his mirror is sufficiently objective is a discussion for another time, but we will see these methods used more frequently over time. Atreyu handled it. It will be interesting to see what we do with it.

Posted in Articles, Data Manipulation, Data Sources, Search Protocols, Trends | 1 Comment »