Analytics and e-discovery

With modern data collection and recording of information has come a dramatic rise in the volumes of documents that are potentially discoverable during trial.

Lloyd Gallagher

However, the move to electronic data retention has had the unfortunate result of people being less organised in their filing. Even where an organisation has excellent data retention and filing policies, the sheer size of data now maintained can be in the hundreds of thousands of documents. Getting to the necessary evidence, finding the key documents and eliminating the irrelevant ones can be costly, as well as time-consuming.

It is estimated that Fortune 1000 corporations now spend an estimated US$5-$10 million annually on e-discovery, with several companies reporting in 2014 that the expenses were as high as US$30 million. 70 per cent of these costs were attributed directly to the physical review of documents (see 2012 RAND study in further reading section at the end of this article). This equates to approximately US$1.8 million per case or US$18,000 a gigabyte.

To combat the financial and time costs, law firms around the globe have approached the problem using different techniques.

Some have tried data analytics to assist in sifting through the volumes of discovery. Using “scan and sort” techniques or collection by e-discovery, firms can approach large data volumes with analytic algorithms to collate and organise information into keywords or data sets based on the search parameters provided.

US companies have trialled a variety of alternatives to the use of outside law firms to mitigate these costs for review tasks, including engaging temporary attorneys or legal process outsourcing with stables of contract attorneys.

Unfortunately, given the rates currently paid to attorneys during such large-scale reviews, this approach has not been sufficient to curb the spending on discovery projects. Another option that has been explored in the US is outsourcing to English-speaking lawyers in countries such as India and the Philippines, with local attorneys to oversee the project. Although an outsourced team can cost much less than US counsel, issues of information security, oversight, maintaining attorney-client privilege and logistics have hampered the cost to benefit analysis of this approach.

The RAND study highlighted other inherent dangers in such approaches, ranging from inconstancy to missing vital information due to inexperience. The old adage seems to apply – no lawyer can know what they do not know and what is relevant until it is relevant. And while we have a range of assumptions that leads us to investigate a particular trail, inexperienced counsel may not see what experienced counsel see, leading to vital information being missed. Furthermore, given the physical limitations of reading and comprehension, better organisation of the documents is not likely to correct the problem unless decisions about individual documents can be applied to dozens or hundreds of similar items on a routine collative basis. And although some document sets may lend themselves to bulk coding, it is unlikely that these techniques would foster significant improvements for most large-scale reviews. Let’s face it, human reviewers are simply prone to inconsistency.

Is there a better way?

It is being argued in some quarters that predictive coding may have some answers. Predictive coding is a type of computer category review that classifies documents according to how well the document matches concepts and terms in sample documents. Using machine learning techniques to continually refine the computer’s classifications with input from users (just as spam filters self-correct to increase the reliability of future decisions about email messages), predictive coding reviews documents to assess their connection to key terms or concepts.

However, before the computer can undertake the test for a given document, humans (i.e. lawyers) must initially examine samples of documents (although much less than the volumes to be assessed) from the review set and make determinations about whether they are relevant, responsive or privileged, and input the categories for the software to record.

Using that input data, the software assigns a score to each document in the review set. This will then be used as a template against which to contrast any future document in the actual review. If a document under review reaches the probability count for match from these desired characteristics, then it will be categorised and placed into the bundle categorised by the template. These can then be reviewed by lawyers as needed.

This approach has the advantage of drastically reducing costs in discovery by having the computer do the initial legwork to get documents in an easier order for litigation review.

Pitfalls to watch out for

Machine learning is only as good as the initial set of instructions. The use of inexperienced counsel to set up the templates (as described above) may result in relevant material being misfiled. Therefore, at the outset, experienced counsel must start the template review. It should be kept in mind that human review is still very much in play with predictive coding, and experienced counsel must make the judgement call as to what falls under the categories of relevance, privilege etc. Despite the cost of experienced counsel in this early stage, evidence still suggests that the reduction in person-hours required to review a large-scale document production will be considerable.

Even where experienced counsel are involved in the process, the template creator must be mindful of keywords and other forms of standard document collation. Most of the time, lawyers do not know in advance exactly which documents will be useful to a case. Traditional review strategies during the discovery phase of litigation often entail identifying search terms likely to locate responsive documents in the data set, known as keywords. These keywords are developed after researching the issues at hand and interviewing individuals and, while these keywords can be useful and help frame the review process, they have serious limitations when used alone.

Let us suppose that we suspect that a dishonest officer is expunging documents from a company database. How do we prove such activity? Often, the initial approach is to undertake a keyword search of log files and other reporting documentation, with the keyword search terms of “delete”, “erase” or “kill”. A search of this type, while it may prove useful at the outset, is limited to those specific terms. If the criminal is devious, erasure will take other forms, and such obvious terms are not the only ones available to remove data. This makes using traditional keyword methods to find the incriminating pattern in data difficult, if not impossible.

When faced with e-discovery, the difficulties for counsel are increased, as traditional text-based e-discovery searches limit what can be uncovered. For example, in the famous Enron dataset, rich and diverse “white noise” was introduced into datasets. Enron executives used many code words, such as Star Wars references, to disguise illegal activities that could have provided attorneys with a range of material for a range of crimes and misdemeanours, if they had known how Enron executives had coded their datasets. But what reasonable attorney would have thought to use “Millennium Falcon” or “Chewbacca” in a keyword search of an energy company’s transactions? The use of predictive coding, however, can recognise patterns and alert attorneys to possibly relevant documents, even where there is a seemingly inexplicable and random use of words and phrases.

Cost of technology and realistic approaches for the small firm

The cost of technology is no longer a barrier for the small practitioner who sees the big cases as being beyond their reach due to the cost of discovery. New technologies and systems are paving the way for the small firm to compete. The emergence of data analytics and visualisation software and inexpensive VR technology can help lawyers organise and analyse mountains of data in new and different ways. These analysers are designed to reveal trends and focus a legal team’s review efforts, creating efficiency in time and effort. A smaller team can now use visualisations and dashboards to accelerate the discovery of key facts in a timely fashion, without a mountain of lawyers wading through paperwork. The benefits of a more level playing field allow litigants greater choice in their representation.

New technologies have a range of positives as we move into the modern world of large datasets and complex litigation. Small and large firms alike will find them a welcome assistant in the new face of litigation and discovery in the future.

Further reading:

Contact Us
Phone 09 303 5270
Fax 09 309 3726
Email reception@adls.org.nz