E-Discovery 2.0 has moved to a new location. You'll be redirected to our new home in 5 seconds. Or, you can find us at http://www.clearwellsystems.com/e-discovery-blog
You do not have to re-subscribe.

Friday, May 25, 2007

What is E-Discovery 2.0?

In a previous post, I wrote about the forces transforming e-discovery, a phenomenon that has received increasing attention from the press, most recently in this week’s Economist magazine. While everyone agrees that something big has changed, and (generally speaking) on the reasons why, people struggle to put their finger on exactly what e-discovery has become.


That’s why I think the concept of “E-Discovery 2.0” is so helpful. Analogous to Web 2.0, E-Discovery 2.0 is a set of new processes, technologies, and services that enable companies to manage huge volumes of data, lower costs, and meet tight deadlines.


New Processes


When e-discovery meant handing over a few boxes of paper, companies did not need much of a process. But in today’s world, where it involves terabytes of data, teams of reviewers, and precious little time, it is a very different story. To cope with the growing volume and complexity of e-discovery issues, companies have had no choice but to adopt new processes. These include:


  • Collect and Preserve: Most companies have now established procedures so that, when the need arises, they can collect all data relevant to a case and ensure that it cannot be changed or deleted.

  • Analyze Up Front: When presented with more work than can be done, a company’s only option is to work smarter, not harder. That means analyzing the collected data up front, to cull it down to only those emails and documents directly relevant to the case at hand.

  • Collaborate Efficiently: E-Discovery has become a team sport. And whenever you have a team, you need a playbook, or a process, to ensure work is not repeated and that everyone is marching towards the same goal.

New Technologies


If technology created this problem, by making electronic communication so pervasive and voluminous, then it can also solve it. In recent years, several new technologies have arisen that enable companies to store and sift through their data to fulfill e-discovery obligations. The most significant of these trends include:


  • From tape to disk: As the cost of disk storage has continued to decline, more and more companies are abandoning tapes and instead keeping their data online. Email archiving software optimizes for storage efficiency, allowing companies to keep hundreds of terabytes of data readily available for e-discovery.

  • From search to analysis: Basic keyword search has evolved into sophisticated analysis technology that mines email meta-data for relevance, links messages together into discussion threads, and groups them by topics. These analysis applications allow users to sift through millions of messages in minutes, to rapidly identify, tag, and export relevant data.

  • From closed systems to open standards: Until recently, technology providers made no effort to integrate their applications, leaving customers to fend for themselves. But that has started to change. Symantec Enterprise Vault and HP RISS now have open APIs, creating pressure on others to follow suit. George Socha’s Electronic Discovery Reference Model (EDRM), a standards body, has received widespread support, accelerating progress towards creation of an open e-discovery platform.

To anyone working in litigation support, legal, or information security, all this is quite unremarkable. Of course they use technology to address e-discovery. Obviously, there has to be a process. From the company’s perspective, e-discovery has become no different to HR or finance – it is a core competency, part of doing business.


And that, perhaps, is the most remarkable thing about E-Discovery 2.0 – in only a few short years, it has become so widespread and deeply entrenched within the enterprise, that people barely notice it.

Sunday, May 20, 2007

Can E-Discovery Really Be That Expensive?

I tend to have a "Mark Twain perspective" on statistics and apply a healthy grain of salt to any numbers quoted by analysts and industry experts. But when end-users speak, I sit up and listen. That's why I was very interested to read here that Microsoft "spends an average of US$ 20 million for e-discovery per litigation, according to one company exec." (My thanks to George for alterting me to the article)

If true, it is an astounding number - but one that is quite consistent with what we have seen first hand working with other large enterprises ourselves. Once you factor in processing costs (an average of $1,800 per GB), review costs ($200/hour), and the huge volume of information being generated and stored, you can get up to $20 million on a single case surprisingly fast.

Sunday, May 13, 2007

The White House And The Problem of A Billion Emails

The other day, Michael Clark of EDDix sent me a fascinating academic paper (thanks, Michael!) about “information inflation” at its impact on the legal system. I had never really thought of it this way, but there have really only been 3 significant events in the evolution of information:

  1. Writing (c. 5,000 years ago): Pre-historic man started to etch his markings on clay tablets, stone, wax, papyrus, bark, cloth, wood, paper, cave walls and anything else that came to hand.
  2. Printing (c. 1450): Gutenberg’s movable type printing press enabled mass production of information, contributing to (among other things) the Renaissance and the Scientific Revolution.
  3. Digitization (c. late 20th Century): The personal computer, wide area networks, internet, email, have all led to a massive explosion of information in the past 50 years. As the article points out, “close to 100 billion emails are sent daily…In a small business, whereas formerly there was usually 1 four-drawer file cabinet full of paper records, now there is the equivalent of 2,000 four-drawer file cabinets full of such records, all contained in a cubic foot or so in the form of electronically stored information.”

How can the legal profession cope, given that a lawyer’s job is often to synthesize this mind-boggling amount of data? Fortunately, the authors have a solution:

“A family of computer technology employing new types of search methods and techniques beyond use of mere keywords should now be considered for use in litigation….Litigators can no longer depend on manual review alone. It is too time-consuming and expensive – with cost often exceeding the amounts in dispute.”

To illustrate its point, the paper tells the story of the White House and the problem of a billion emails. During the Clinton administration, the White House agreed to a form of electronic record keeping called ARMS (Automated Records Management System). At the end of each administration, these records are handed over to the National Archives and Records Administration (NARA). The table below shows the number of stored emails NARA has, or expects to receive at the end of each administration.



Now assume that, like previous administrations, the Next President’s administration is subject to a lawsuit that requires e-discovery. The paper calculates:

“Without employing any automated computer process to generate potentially responsive documents, the review effort for this litigation would take 100 people, working 10 hours a day, 7 days a week, 52 weeks a year, over 54 years to complete. And the cost of such a review, at an assumed billing rate of $100/hour, would be $2 billion. Even, however, if present day search methods are used to initially reduce the email universe to 1% of its size (i.e., 10 million documents out of 1 billion), the case would still cost $20 million for a first pass review conducted by 100 people over 28 weeks, without accounting for any additional privilege review.”

This is a great example of why companies and government agencies are adopting e-discovery 2.0 technologies that go far beyond keyword search. In the face of information inflation, what choice do they have?

Thursday, April 19, 2007

From Web 2.0 To E-Discovery 2.0

If there’s one idea that has captivated Silicon Valley in the past 3 years, it is Web 2.0. People may debate its meaning and definition, but the gist of it is clear: a handful of powerful forces have coalesced to make the internet of today fundamentally different to what it was 5 years ago. Opinions vary on which of these forces is most important: the growth of broadband to the home; open source, ajax and other technologies which lower the cost and increase the functionality of web applications; the power of community in a world where more people are on the web. Whichever you choose, there is no doubt that collectively these forces have had a huge impact, powering the growth of now-household names such as Google, MySpace, and YouTube.

I believe that an analogous set of changes is transforming the way companies do e-discovery. Ten years ago, e-discovery was an after-thought – a necessary, but incidental, part of corporate legal expenses. Today, it is a huge line-item in the legal budget, a headache for corporate IT, and the foundation upon which many cases are built.

E-discovery 1.0 was an ad hoc activity; e-discovery 2.0 is a core business process. E-discovery 1.0 was barely noticed; e-discovery 2.0 is driving the news cycle, affecting everyone from Intel to the US Attorney General. In the legal world, e-discovery 2.0 has had every bit as big an impact on enterprises as Web 2.0 has had on the dating lives of teenagers.

What happened? A series of fundamental changes have made e-discovery far more important, expensive, and complex than it was in the 1990s. Chief among these changes are:

1. Email, Not Voicemail: In the past 10 years, companies have switched from voicemail to email as the primary way they communicate. This has created a written record where none previously existed. Just as oral histories eventually die out, every voicemail eventually gets deleted; but emails and the written word live forever. Whatsmore, the convenience and time-efficiency of email makes it addictive, with the result that every meaningful conversation is captured, time-stamped, and attached to a person’s name. Given that many legal cases turn on intent, and proving who knew what when, this makes email a virtual treasure trove for anyone building a case.

2. Electronic Files, Not Paper: Electronic files are fundamentally different to paper documents: they reproduce like rabbits and are far cheaper to store. For example, one laptop is the equivalent of 2,000 boxes of paper; one server corresponds to 8,000-40,000 boxes of paper. The number of servers and laptops holding vast quantities of email is only increasing as the cost of hard disk storage falls, down from $2.04 per GB in 2004 to $0.77 per GB in 2006. Net net: going electronic has vastly increased the amount of data that must be analyzed as part of the discovery process.

3. Sooner, Not Later: Recent changes to the FRCP guidelines have moved e-discovery up in the process, forcing companies to have an e-discovery plan within 99 days of a suit being filed. Since disputes rarely settle that quickly, that means enterprises must now incur the expense of e-discovery on every case, not just the small number that actually make it to court. The result is a massive increase in e-discovery expenses and workload.

Anecdotal evidence of e-discovery 2.0 is everywhere. A few years back, no one would have guessed that every major analyst firm would have people dedicated to tracking e-discovery. Nor would you have expected to find a litigation support manager at every major enterprise.

So what exactly is e-discovery 2.0? Well, I will talk about that in a future post.

Monday, April 16, 2007

eDiscovery In The Blogosphere

It has now been over a month since I started blogging about Email Intelligence and eDiscovery, and perhaps the most pleasant surprise has been to find that I am not alone. As the chart below shows, there has been an explosion of activity around eDiscovery in the blogosphere since the FRCP Rule changes on December 1, 2006, with the happy result that today there are several voices which are well worth listening to.



To assist you in your travels, I offer a brief (and by no means comprehensive) guide to the blogs which have caught my eye. In general, they fall into 3 categories:


1. Messaging Mavens: For an entertaining look at email in the news, I would suggest Roger Matus’ Death by Email, which has everything from videos to colorful commentary. In a similar vein, Chris Foreman’s Messaging Mogul offers an interesting perspective on relevant technologies, in a way that is refreshingly free of the usual mind-numbing marketing-speak.


2. Legal Eagles: There are many lawyers who blog, often covering arcane topics or issues particular to a specific industry. But the one general, business-oriented legal blog that I would recommend is Andrew Cohen’s blog, which makes a range of complicated legal topics accessible to the general reader.


3. Article Clippers: Finally, there are the folks who helpfully collect interesting articles from around the web into a single place, so you can get a filtered view of the latest eDiscovery stories. Foremost among these is Jeff Fehrman and Bob Krantz’s edd blog online which focuses on eDiscovery and forensics.


I do not pretend to have anything close to a complete list. So if there are others worthy of a mention, please add them as a comment so that I can update my list.

Wednesday, April 4, 2007

Go Ahead, Sue Me!


It is a truism to say that it is easier to dispense advice than to follow it, and with good reason. How many venture capital firms practice the financial discipline they preach to their portfolio companies? How many management consulting companies employ the innovative management theories they advocate to their clients? And how many technology companies actually leverage leading-edge technology to solve their own business problems?


The answer, at least based on my experience, is “not very many”. For example, if you look at Silicon Valley’s leading technology companies, the vast majority do not have an e-discovery solution in place. Yes, there are some exceptions but for the most part, when it comes to e-discovery, the likes of eBay, Google, Yahoo, and (until recently) Intel have preferred to muddle through with manual, error-prone, expensive processes.


The justifications are typically the same. Some technology companies argue that they don’t need a legal discovery solution because theirs is not a litigious industry; others say they delete everything off their Exchange servers within three weeks and so don’t have any email to discover; all agree that things like email and document retention policies are needlessly bureaucratic.


The danger of this “we- don’t- need- car- insurance- because- we- will- never- have- an- accident” approach has been brutally exposed in the past few weeks by the painful experience of Intel. In case you missed the press coverage: AMD sued Intel for anti-trust violations. Like any company on the receiving end of a subpoena, Intel was obliged to provide opposing counsel with all email and documents relevant to the case.


If Intel had an e-discovery solution, that would have been a straightforward process. Intel’s IT group would simply identify a group of messages by date range, person, and perhaps keyword within their larger email archive. The legal group would then use an analysis product to cull down the messages to only those relevant to the case. The whole thing would take a few days. But that’s not what happened. Since Intel did not have an e-discovery solution, the company had no simple way to preserve and analyze the relevant data. Intel’s legal department was obliged to inform over a thousand employees that they could no longer delete data at will. Somewhere along the line, the message did not get through and employees kept on deleting. As a result, Intel was forced to go back to the judge with the proverbial “the dog ate my homework” defense, while AMD cried foul.


How much this costs Intel is yet to be determined. But my guess is that they will end up spending more on lawyers to fix the mess than they would have spent on an e-discovery solution that would have avoided the problem to begin with.


While I have given up on venture capitalists and management consultants, I remain optimistic that the technology industry will practice what it preaches and leverage technology to solve its own business problems in e-discovery. As Intel discovered, it is not enough to have smart lawyers on staff. You also need to equip them with an e-discovery solution that allows them to preserve and analyze information relevant to the case.


To do otherwise is an open invitation to your competitors to sue you. Just ask Larry Ellison – or better yet, SAP.

Thursday, March 29, 2007

Analyze Email First, Talk Later

More than perhaps any other type of case that companies deal with, employment disputes often boil down to “he said, she said”. Since witnesses are rarely present, both sides quickly then look to email for supporting evidence. This usually happens behind closed doors giving the innocent bystander no visibility into how the process works. That’s why the case of the Justice Department and the 8 fired US Attorneys is so interesting – it illustrates what happens every day in similar, less high-profile employment disputes.


The questions are always the same:

1. Who made the decision? Initially, the answer given was Kyle Sampson, the Attorney General’s Chief of Staff. But email told a different story, showing that the Attorney General was involved, something that Mr. Sampson later confirmed in Congressional Testimony.

2. Why were the people fired? Initially, the answer was “performance reasons”. But internal department e-mail messages show consideration was also given to the views of senators, administration policy priorities, and legislative goals.

3. Was the decision justified? Well, that’s where the evidence stops, and human judgment comes in. Supporters of the decision would say it was perfectly justified, but poorly executed; opponents would argue it is more evidence of politicizing the judiciary.


As in other employment disputes that occur outside of the public eye, email analysis takes the “he said, she said” out of the situation. If you know who made the decision and why, it becomes much easier to decide whether the action was appropriate. The current difficulties at the Justice Department stem as much from the fact that they did not analyze their emails before making public statements, as it does from what they actually did.

Sunday, March 25, 2007

FRCP Rule Changes: What’s The Big Deal?

As any venture capitalist will tell you, there are two forces which open the window to creating huge, new businesses. The first is a technological breakthrough – think internet, the microprocessor, or mapping the human genome. The second is a change in the regulatory environment, such as airline/telecom deregulation, the new subsidies fueling the boom in clean energy – and the new Federal Rules of Civil Procedure (FRCP), which came into effect on December 1, 2006.


In the legal world, the new FRCP guidelines are a HUGE deal: it is the first time they have changed in 38 years, which is perhaps not surprising since they require approval from Congress and the Supreme Court. As you would expect from our greatest legal minds, the Rules themselves are long, complicated, and (for most people) the perfect antidote to insomnia. But from business’ perspective, the net effect of the changes is pretty simple: there will be a lot more e-discovery.


To understand why, consider the average company with revenues over $1B. According to a recent survey, this “average company” is concurrently managing 556 cases. If you assume that 50% of its cases settle before going to court, then before the FRCP rule changes this company would only have been doing e-discovery on 278 cases.


That all changed on December 1, 2006, when Rules 16 and 26 were amended to provide the court early notice of e-discovery issues. Under Rule 16(b), parties must “meet and confer” at least 21 days before the scheduling conference which, in turn, must occur within 120 days of filing a lawsuit. Rule 16(b) further states that the scheduling order must include “provisions for disclosure or discovery of electronically stored information”, while Rule 26(f) requires that parties “discuss any issues relating to preserving discoverable information and to develop a proposed discovery plan.”


The bottom line: companies can no longer leave e-discovery for later in the process. Thanks to the FRCP rule changes, they must now define and share their e-discovery plans at the “meet and confer” which occurs within the first 99 days of a case. Since cases rarely settle that quickly, our “average company” is now obliged to do e-discovery on all 556 of its concurrent cases, not just the 278 that do not settle. For the corporate legal department, that means their e-discovery workload has doubled overnight, with no increase in manpower to cope with the extra work. So companies are forced to reconsider their e-discovery process and look for ways to leverage technology to cope in a post-FRCP-rule-changes world.



In case you were wondering, I did not figure this out for myself. It was first explained to me by the folks over at the Nassau County Attorney's Office, and has since been echoed by many of our corporate customers.

Friday, March 16, 2007

Email, Politics, And The Media


I will let others better qualified than me comment on the political implications of the recent furore over the firing of several US Attorneys. But one interesting aspect of the story from my perspective has been seeing email front-and-center in the news cycle.

Everyone from CNN to the Washington Post to the Wall Street Journal has led with “email-driven” stories, with headlines like “Rove, Gonzales discussed firings, e-mails show”. On March 14, the Journal (subscription required) provided this chart and reported:

Emails between White House aides and Attorney General Alberto Gonzales's chief of staff show an orchestrated effort to fire several U.S. attorneys, counter to Mr. Gonzales's previous assertions that the firings weren't instigated by the White House.

Today, the Washington Post led with (bold and underlines are added by me):

The Justice Department advocated in early 2005 removing up to 20 percent of the nation's U.S. attorneys whom it considered to be "underperforming" but retaining prosecutors who were "loyal Bushies," according to e-mails released by Justice late yesterday.

The three e-mails also show that presidential adviser Karl Rove asked the White House counsel's office in early January 2005 whether it planned to proceed with a proposal to fire all 93 federal prosecutors. Officials said yesterday that Rove was opposed to that idea but wanted to know whether Justice planned to carry it out.

The e-mails provide new details about the early decision-making that led to the firings of eight U.S. attorneys last year, indicating that Justice officials endorsed a larger number of firings than has been disclosed and that Rove expressed an early interest in the debate over the removals.

Setting aside the politics of all this, the press is using email to address two questions: who was involved, and are their public statements accurate? This is very similar to how I see email being used every day in the corporate world. Any legal proceeding or corporate investigation centers on understanding who knew what and when – and email is the place lawyers and investigators go to find that out.

Why? Because the beauty of email is that it is the source of truth, the indisputable statement of record. No need to ask people for incomplete recollections, no need to filter out the spin; just analyze their email and you will find out who did what – and perhaps even get a window into how they decided to do it.

Thursday, March 15, 2007

“I Missed The Boat”

The other day, I heard that a local technology company had a lot of pain around ediscovery. Within hours, one of our board members had contacted the General Counsel who informed us that they had just purchased a product for eDiscovery 3 weeks prior.


That evening over dinner, I summarized the situation by saying to my wife that “I missed the boat.” My 3-year-old son, who was also at the table, immediately started to quiz me: “You missed the boat? The boat left without you? You were late so the boat had to go? There wasn’t room on the boat for you?” For days afterwards, when I left for work, he would ask me: “Are you going on the boat today?”


The thing that really struck me is that we often speak in metaphors, and it is not just 3 year olds who have trouble understanding. One of our customers is a large manufacturing company. On deploying Clearwell to analyze its email, the company discovered a large number of messages with the expression: “The eagle has landed”. That struck them as rather odd, so they investigated and discovered a group of employees were illegally selling company equipment on the grey market and every time they made a sale, they would let those involved know by sending out an email saying “The eagle has landed.” Viewed alone, the expression looks innocent enough; once viewed in the context of emails flowing back and forth, it was clearly a statement of guilt.


Examples like this illustrate why simple keyword search is not enough. Companies need more sophisticated tools which, among other things, group emails into topic areas and link them into discussion threads, to surface coded expressions and analyze them in context. To rely on keyword search alone would be like someone at Proctor and Gamble seeking to analyze point-of-sale data from Walmart with a pocket calculator.


Of course, that doesn’t help me explain missing the boat to my son. So over the weekend, we took him on the ferry between San Francisco and Sausalito – just so he knows I don’t miss the boat every time.