Post Process

Everything to do with E-discovery & ESI

Archive for the ‘Search Engine Technology’ Category

Around the block: 10/18/10

Posted by rjbiii on October 18, 2010

A few articles of note:

Unsurprisingly, to those who have been paying attention, some of Facebook’s apps transmit personally identifiable data. This breaks Facebook’s rules and raises many of the same privacy questions that has dogged the site in recent times. From the a WSJ article on the issue:

The problem has ties to the growing field of companies that build detailed databases on people in order to track them online—a practice the Journal has been examining in its What They Know series. It’s unclear how long the breach was in place. On Sunday, a Facebook spokesman said it is taking steps to “dramatically limit” the exposure of users’ personal information.

“A Facebook user ID may be inadvertently shared by a user’s Internet browser or by an application,” the spokesman said. Knowledge of an ID “does not permit access to anyone’s private information on Facebook,” he said, adding that the company would introduce new technology to contain the problem identified by the Journal.

The Ensigns blog has posted an interesting article on Search, perhaps inaptly entitled E-Discovery Search: The Truth, the Statistical Truth, and Nothing But the Statistical Truth. It is a very good primer on search, rather than on statistical methodology that one might surmise from the title. It is, however, a good article. An example is a passage on Latent Semantic Indexing:

What does “Latent” mean? Roughly speaking, it means “hidden.” And “Semantic” means, again roughly, “meaning.”

So, the phrase is actually descriptive of what we are trying to accomplish: find the hidden meanings (patterns) in a collection of documents, not because of the specific words we choose as input, but because of the other words in the documents containing the words we did choose and their “co-occurrence” with words in other documents, documents which do not contain our search terms.

Law.com provides you 10 helpful tips for managing cases. In 10 Tips for Effective Litigation Case Management, there is more than just a nod to applying project management principles to help with ROI and making decisions, an approach of which I greatly approve. From the article:

The past decade has ushered in significant new challenges in litigation case management. These include: the explosion in electronic discovery, the increasing importance of cross-border cooperation in litigation and investigations, and the expectation that counsel will keep abreast of, and communicate to their clients, changes in relevant legal rules and precedent on a virtually real-time basis.

These challenges have been accelerated by the global financial crisis, which has led clients to become more comfortable asking for, and coming to expect, services and fee arrangements tailored to their unique needs and goals. We are in an era of increasing competition and increasingly sophisticated legal consumers. The goal must be maximizing client value without sacrificing quality service. In the end, after all, the business of law really is all about the client and achieving its objectives.

In brief, other topics include:

Posted in Articles, Privacy, Project Management, Search Engine Technology | Tagged: , , , , | Leave a Comment »

The Search Engine as Electronic Brain: Wolfram Alpha goes Live in May

Posted by rjbiii on March 11, 2009

CNet blogger Dan Farber discusses the upcoming release of Stephen Wolfram’s latest venture: a new search engine that is being touted as a breakthrough:
[Entrepreneur Nova] Spivack gave some insight as to how the Wolfram’s search engine works:

Wolfram Alpha is a system for computing the answers to questions. To accomplish this it uses built-in models of fields of knowledge, complete with data and algorithms, that represent real-world knowledge.

For example, it contains formal models of much of what we know about science — massive amounts of data about various physical laws and properties, as well as data about the physical world.

Based on this you can ask it scientific questions and it can compute the answers for you. Even if it has not been programmed explicity to answer each question you might ask it.

But science is just one of the domains it knows about–it also knows about technology, geography, weather, cooking, business, travel, people, music, and more.

It also has a natural language interface for asking it questions. This interface allows you to ask questions in plain language, or even in various forms of abbreviated notation, and then provides detailed answers.

The vision seems to be to create a system which can do for formal knowledge (all the formally definable systems, heuristics, algorithms, rules, methods, theorems, and facts in the world) what search engines have done for informal knowledge (all the text and documents in various forms of media).

As the article mentions, Wolfram is the creator of Mathematica, and the writer of a book (not always warmly received) entitled A New Kind of Science.

Posted in Articles, Search Engine Technology, Technology, Trends | Tagged: , | Leave a Comment »

Case Blurb: YouTube; Denying Motion Compelling the Production of Source Code to Opponents

Posted by rjbiii on August 12, 2008

Plaintiffs move jointly pursuant to Fed. R. Civ. P. 37 to compel [Defendants] to produce certain electronically stored information and documents, including a critical trade secret: the computer source code which controls both the YouTube.com search function and Google’s internet search tool “Google.com”. [Defendants] cross-move pursuant to Fed. R. Civ. P. 26(c) for a protective order barring disclosure of that search code, which they contend is responsible for Google’s growth “from its founding in 1998 to a multi-national presence with more than 16,000 employees and a market valuation of roughly $ 150 billion”, and cannot be disclosed without risking the loss of the business. Viacom Int’l Inc. v. YouTube Inc., 2008 U.S. Dist. LEXIS 50614, 7-8 (S.D.N.Y. July 1, 2008 ) (internal citations removed).

YouTube and Google maintain that “no source code in existence today can distinguish between infringing and non-infringing video clips — certainly not without the active participation of rights holders”, and Google engineer Amitabh Singhal declares under penalty of perjury that:

The search function employed on the YouTube website was not, in any manner, designed or modified to facilitate the location of allegedly infringing materials. The purpose of the YouTube search engine is to allow users to find videos they are looking for by entering text-based search terms. In some instances, the search service suggests search terms when there appears to be a misspelling entered by the user and attempts to distinguish between search terms with multiple meanings. Those functions are automated algorithms that run across Google’s services and were not designed to make allegedly infringing video clips more prominent in search results than non-infringing video clips. Indeed, Google has never sought to increase the rank or visibility of allegedly infringing material over non-infringing material when developing its search services.

Id. at *9-10 (internal citations removed).

Plaintiffs argue that the best way to determine whether those denials are true is to compel production and examination of the search code. Nevertheless, YouTube and Google should not be made to place this vital asset in hazard merely to allay speculation. A plausible showing that YouTube and Google’s denials are false, and that the search function can and has been used to discriminate in favor of infringing content, should be required before disclosure of so valuable and vulnerable an asset is compelled.

Nor do plaintiffs offer evidence supporting their conjecture that the YouTube.com search function might be adaptable into a program which filters out infringing videos. Plaintiffs wish to “demonstrate what Defendants have not done but could have” to prevent infringements, (plaintiffs’ italics), but there may be other ways to show that filtering technology is feasible FN2 and reasonably could have been put in place. Id. at *10 (internal citations removed).

FN2: In the Viacom action:

Viacom is currently using fingerprinting technology provided by a company called Auditude in order to identify potentially infringing clips of Viacom’s copyrighted works on the YouTube website. The fingerprinting technology automatically creates digital “fingerprints” of the audio track of videos currently available on the YouTube website and compares those fingerprints against a reference library of digital fingerprints of Viacom’s copyrighted works. As this comparison is made, the fingerprinting technology reports fingerprint matches, which indicate that the YouTube clip potentially infringes one of Viacom’s copyrighted works.

Finally, the protections set forth in the stipulated confidentiality order are careful and extensive, but nevertheless not as safe as nondisclosure. There is no occasion to rely on them, without a preliminary proper showing justifying production of the search code.

Therefore, the cross-motion for a protective order is granted and the motion to compel production of the search code is denied. Id. at *11.

Posted in 2nd Circuit, Case Blurbs, Discovery Requests, Duty to Produce, FRCP 26(c), FRCP 37, Judge Louis L. Stanton, Objections to Discovery Requests, Relevance, S.D.N.Y, Scope of Discovery, Search Engine Technology, Source Code, Technology, Tools, Trade Secrets | Tagged: , , , | Leave a Comment »

Google Search gets Personal

Posted by rjbiii on November 30, 2007

Google is adding functionality to its searches that will allow users some input into the ranking and and sorting of results:

Google has rolled out a new option in its Labs-based experimental search program which allows you to rank and re-order search results. The new experiment is reportedly showing up for select users only, but the help page says that the goal is to allow you to “influence your search experience by adding, moving, and removing search results.”

Those of us in EDD are always looking for ways to tweak searches to better fit them to our clients’ needs. It will be interesting to see how easily and effectively users are able to influence the accuracy of these searches, and how soon such technology makes it into review platforms and the like.

Posted in Articles, Search Engine Technology, Search Protocols | Leave a Comment »

Advancing past keywords

Posted by rjbiii on October 1, 2007

Key word searches are elementary. Perhaps too elementary, according to Law Technology Today, who posts an article praising the benefits of concept search and content analysis, techniques that are sorely under-utilized in today’s discovery projects.

Without endorsement by the legal community, litigation teams have been understandably reluctant to adopt concept searching and data analytics in their discovery strategies. Beyond the normal fear of the unknown, attorneys and those who support them have articulated concerns that this new technology may not be defensible. If attorneys were eliminated from the process there would be unacceptable risk. They’ve also expressed fear and disdain for what they consider to be “black box” technology.

In fact, when concept search and content analysis are done properly, [certain attendant] concerns should go away.

We’ll see more and more articles in the legal media about this and similar issues. Searching technology and techniques are part of the reason that, handled correctly, dealing with ESI is actually preferrable than dealing with paper.

[HT: Information Governance Engagement Area]

Posted in Articles, Search Engine Technology, Trends | Tagged: | Leave a Comment »