Friday, May 25, 2007

What is E-Discovery 2.0?

In a previous post, I wrote about the forces transforming e-discovery, a phenomenon that has received increasing attention from the press, most recently in this week’s Economist magazine. While everyone agrees that something big has changed, and (generally speaking) on the reasons why, people struggle to put their finger on exactly what e-discovery has become.

That’s why I think the concept of “E-Discovery 2.0” is so helpful. Analogous to Web 2.0, E-Discovery 2.0 is a set of new processes, technologies, and services that enable companies to manage huge volumes of data, lower costs, and meet tight deadlines.

New Processes

When e-discovery meant handing over a few boxes of paper, companies did not need much of a process. But in today’s world, where it involves terabytes of data, teams of reviewers, and precious little time, it is a very different story. To cope with the growing volume and complexity of e-discovery issues, companies have had no choice but to adopt new processes. These include:

  • Collect and Preserve: Most companies have now established procedures so that, when the need arises, they can collect all data relevant to a case and ensure that it cannot be changed or deleted.

  • Analyze Up Front: When presented with more work than can be done, a company’s only option is to work smarter, not harder. That means analyzing the collected data up front, to cull it down to only those emails and documents directly relevant to the case at hand.

  • Collaborate Efficiently: E-Discovery has become a team sport. And whenever you have a team, you need a playbook, or a process, to ensure work is not repeated and that everyone is marching towards the same goal.

New Technologies

If technology created this problem, by making electronic communication so pervasive and voluminous, then it can also solve it. In recent years, several new technologies have arisen that enable companies to store and sift through their data to fulfill e-discovery obligations. The most significant of these trends include:

  • From tape to disk: As the cost of disk storage has continued to decline, more and more companies are abandoning tapes and instead keeping their data online. Email archiving software optimizes for storage efficiency, allowing companies to keep hundreds of terabytes of data readily available for e-discovery.

  • From search to analysis: Basic keyword search has evolved into sophisticated analysis technology that mines email meta-data for relevance, links messages together into discussion threads, and groups them by topics. These analysis applications allow users to sift through millions of messages in minutes, to rapidly identify, tag, and export relevant data.

  • From closed systems to open standards: Until recently, technology providers made no effort to integrate their applications, leaving customers to fend for themselves. But that has started to change. Symantec Enterprise Vault and HP RISS now have open APIs, creating pressure on others to follow suit. George Socha’s Electronic Discovery Reference Model (EDRM), a standards body, has received widespread support, accelerating progress towards creation of an open e-discovery platform.

To anyone working in litigation support, legal, or information security, all this is quite unremarkable. Of course they use technology to address e-discovery. Obviously, there has to be a process. From the company’s perspective, e-discovery has become no different to HR or finance – it is a core competency, part of doing business.

And that, perhaps, is the most remarkable thing about E-Discovery 2.0 – in only a few short years, it has become so widespread and deeply entrenched within the enterprise, that people barely notice it.

Sunday, May 20, 2007

Can E-Discovery Really Be That Expensive?

I tend to have a "Mark Twain perspective" on statistics and apply a healthy grain of salt to any numbers quoted by analysts and industry experts. But when end-users speak, I sit up and listen. That's why I was very interested to read here that Microsoft "spends an average of US$ 20 million for e-discovery per litigation, according to one company exec." (My thanks to George for alterting me to the article)

If true, it is an astounding number - but one that is quite consistent with what we have seen first hand working with other large enterprises ourselves. Once you factor in processing costs (an average of $1,800 per GB), review costs ($200/hour), and the huge volume of information being generated and stored, you can get up to $20 million on a single case surprisingly fast.

Sunday, May 13, 2007

The White House And The Problem of A Billion Emails

The other day, Michael Clark of EDDix sent me a fascinating academic paper (thanks, Michael!) about “information inflation” at its impact on the legal system. I had never really thought of it this way, but there have really only been 3 significant events in the evolution of information:

  1. Writing (c. 5,000 years ago): Pre-historic man started to etch his markings on clay tablets, stone, wax, papyrus, bark, cloth, wood, paper, cave walls and anything else that came to hand.
  2. Printing (c. 1450): Gutenberg’s movable type printing press enabled mass production of information, contributing to (among other things) the Renaissance and the Scientific Revolution.
  3. Digitization (c. late 20th Century): The personal computer, wide area networks, internet, email, have all led to a massive explosion of information in the past 50 years. As the article points out, “close to 100 billion emails are sent daily…In a small business, whereas formerly there was usually 1 four-drawer file cabinet full of paper records, now there is the equivalent of 2,000 four-drawer file cabinets full of such records, all contained in a cubic foot or so in the form of electronically stored information.”

How can the legal profession cope, given that a lawyer’s job is often to synthesize this mind-boggling amount of data? Fortunately, the authors have a solution:

“A family of computer technology employing new types of search methods and techniques beyond use of mere keywords should now be considered for use in litigation….Litigators can no longer depend on manual review alone. It is too time-consuming and expensive – with cost often exceeding the amounts in dispute.”

To illustrate its point, the paper tells the story of the White House and the problem of a billion emails. During the Clinton administration, the White House agreed to a form of electronic record keeping called ARMS (Automated Records Management System). At the end of each administration, these records are handed over to the National Archives and Records Administration (NARA). The table below shows the number of stored emails NARA has, or expects to receive at the end of each administration.

Now assume that, like previous administrations, the Next President’s administration is subject to a lawsuit that requires e-discovery. The paper calculates:

“Without employing any automated computer process to generate potentially responsive documents, the review effort for this litigation would take 100 people, working 10 hours a day, 7 days a week, 52 weeks a year, over 54 years to complete. And the cost of such a review, at an assumed billing rate of $100/hour, would be $2 billion. Even, however, if present day search methods are used to initially reduce the email universe to 1% of its size (i.e., 10 million documents out of 1 billion), the case would still cost $20 million for a first pass review conducted by 100 people over 28 weeks, without accounting for any additional privilege review.”

This is a great example of why companies and government agencies are adopting e-discovery 2.0 technologies that go far beyond keyword search. In the face of information inflation, what choice do they have?