Too many organizations approach litigation and compliance investigations the same way, using the same technology, approach and people. However, the approach to managing electronic information in federal regulatory compliance investigations differs significantly from the approach to document review in the litigation context. Understanding this difference is the key to recognizing and defining the capabilities that are necessary for effective investigations and workflow.

A litigation review is about just that: reviewing documents. Reviewers are trained to recognize documents relating to a well-developed fact pattern underlying an articulated dispute. Discovery deadlines and mountains of ESI place the emphasis on review speed and volume. Analytics play a narrow, focused role in review—devoted primarily to improving efficiency and maximizing throughput.

Investigations absolutely need to take a different tack. By comparison, developing a cogent fact pattern is the ultimate objective of an investigation, not the starting point. Investigations are about fact development through targeted document analysis, not exhaustive document review. That means a different mindset, a different approach and significantly more emphasis on effective analytics.

Why the Difference? The principal difference between a litigation review and an investigation is the lack of knowledge. An investigation is like a jigsaw puzzle with few pieces on the table—little is known at the outset. Consequently, investigators search the documents for threads, patterns and relationships that can be connected to reveal the underlying narrative.

Approaching this effort like a litigation review will stymie an investigation. Batching documents for serial review, even the “best” and most similar documents, is redundant and inefficient. There is no need to find all documents concerning any aspect of the investigation, just enough to fully answer any questions. Then move on to the next question. (Thus the need for advanced analytics to focus review on only the most critical documents.)

Leveraging Advanced Analytics Modern analytics techniques can be a scalpel in the hands of a skilled investigator, cutting through layers of irrelevant documents to reach to the heart of the inquiry. And certain analytics have proven to be particularly helpful.

Most investigations start with a mass of documents from disparate sources, with little or no organization—the inboxes of several employees from different departments, for example. Attempting to wade through that morass, even with traditional Boolean search techniques, can be a daunting and inefficient task.

To provide some structure to the collection, investigation tools rely on unsupervised machine learning techniques to group documents into labeled concepts or clusters based on semantic similarities. The labels provide a virtual table of contents, and a granular insight into the substance of the documents within the collection. Investigators can use those labels to get a sense of the nature and diversity of the collection, to make gross decisions on the probative value of individual groups, and even to direct and narrow their focus for further investigation.

State-of-the-art communication analytics provide another means of superimposing structure, and surfacing critical information early in the investigation. At the highest level, communication analytics provide a macroscopic view of the social network spanning the collection. At the individual level, this identifies personal communication patterns; at a domain level, it highlights those communications staying within, and those leaving, the organization. Digging deeper into one-to-one communications will uncover unknown witnesses or custodians that can be integrated into the investigatory process.

There are many advanced analytics techniques that can be leveraged to effectively reach the critical documents during an investigation: regex and other entity identification tools, interactive timeline controls, sentiment analysis, etc. The constant in investigations, however, is the focus on finding and analyzing only critical documents, rather than conducting an exhaustive review for tangentially related ESI.

Optimizing Technology-Assisted Review Technology-assisted review ensures a thorough document review in the litigation context. Managed properly, TAR can be an effective technique for locating pertinent documents in an investigation as well.

The right technology is critical. Continuous active learning protocols are imperative, because a CAL algorithm trains from the very first decision and returns documents that are most similar to the relevant training examples.

Training should be efficient. Once documents relevant to an inquiry are identified, they become training examples to uncover related documents. Otherwise, train with an exemplar synthetic seed—studies show that a CAL algorithm is effective with just a single training document.

Once the documents become redundant, move to the next area of inquiry. Again, unlike litigation, there is no need to find every relevant document, just enough to answer the questions.

Exploring the Unknown Every investigation suffers from the concern that there may be pertinent documents that go unseen. Advanced analytics focus the investigation on specific areas of inquiry. TAR expands the document review in those same areas. But neither technique is directed at discovering the unknown.

State-of-the-art TAR tools include functionality directed at exploring those unknown areas, by locating the documents that are most contextually diverse from everything reviewed to that point. Those documents may or may not be pertinent to the investigation, but reviewing contextually diverse documents minimizes the likelihood of missing critical information that was otherwise unknown.

Proving a Negative Finally, combining each of these techniques is one way to demonstrate that there are no (or statistically very few) pertinent documents in a collection. This step of “proving a negative” is particularly useful in responding to governmental information requests.

In reality, proving a negative requires an investigator to use every one of these techniques in a diligent effort to find responsive documents. Again, there is no need to review every document in the collection. Once enough documents have been reviewed to make a reasonable statistical showing of the paucity of responsive documents, the review can conclude.

Ultimately, an investigation is about finding critical documents, not exhaustive review. Modern e-discovery tools make that possible.

Thomas Gricks, Esq., is director, Data Analytics, Catalyst (part of OpenText). Tom advises corporations and law firms on best practices for applying TAR technology.