Predictive coding has tremendous appeal, at least in theory. As a practical matter, however, many have been deterred from using it because various hurdles can arise. Nevertheless, with some forethought and preparation, and by involving those with the right expertise, many of the hurdles can be overcome, or at least minimized, and parties may more often realize the potential benefits of predictive coding.

What Is Predictive Coding?

Predictive coding—often referred to as “technology assisted review” or “TAR”—uses mathematical and statistical algorithms to determine whether documents are likely to be relevant. To do so, it utilizes machine learning, in which reviewers code sample documents drawn from the overall document population.

Essentially, the predictive coding tool identifies other documents in the population that share similar features with the sample documents coded as “positive” (i.e., relevant or responsive) or “negative” (i.e., irrelevant or non-responsive).

How Does It Work?

To understand how to make predictive coding practical, you first need to have a general understanding of how it works.

The traditional workflow for predictive coding has involved commencing machine learning with a “seed set” of pre-coded documents. The seed set can consist of a sample selected at random, through the use of initial search terms, documents already determined to be relevant documents, or through other means.