E Discovery

After reading the new Technology Assisted Review (TAR) Guidelines from EDRM, it is clear that the evolution of the underlying technology in TAR solutions is reshaping the role of the subject matter expert (SME). While the Guidelines maintain the role of the SME—typically an experienced (and expensive) attorney most familiar with the project's subject matter—in ensuring reviewer accuracy and assisting in training the model, they also acknowledge the emergence of new technologies which can reduce the burden on the SME.

Newer active learning solutions allow for a continuous training of the model through a prioritized review. This spares the SME the review of multiple training and QC rounds associated with TAR 1.0 solutions, and allows more time for more targeted training. TAR 1.0 solutions can take more time to train the model, whereas the training of the active learning model begins after reviewing a smaller threshold amount of documents.

Training sets, traditionally a TAR 1.0 feature, are offered with some active learning solutions. These allow the SME to elevate training through an isolated review of conceptually-rich key documents, while the review team focuses on the prioritized review queue. The training set can either be a completely randomized sample across the corpus, or a seed set supplemented with a randomized sample. This approach aims to minimize bias while still injecting richness in the sample.

The Guidelines acknowledge that there are different views for the best method of selecting training sets. The different approaches result from varying levels of concern over bias in the training set by relying on “human judgment” or “differing preferences by human reviewers” to select the documents. The Guidelines instruct that any approach to selecting training data will produce an effective predictive model if it is used to produce a sufficiently broad training set. “Thus, differing views over selection of training data are less about whether an effective predictive model can be produced, than about how much work it will take to do so.”

Newer TAR solutions alleviate the burden of training in other ways. In some platforms, multiple models can run concurrently. This allows a reviewer training for relevance to simultaneously train for privilege or specific issues, thereby cutting back on costly re-review efforts.

Active learning solutions can also more easily address the challenge of supplemental collections. With earlier (TAR 1.0) solutions, when new datasets introduced new document features or concepts to the corpus, the model would need additional training in order to properly understand and categorize these new document types. Due to the static nature of the predictive coding index, each addition of this type would require the process of training to be started anew. This included the rebuilding of the index and repetition of the human review process. This redoubled review effort can include coding a seed set, and conducting the numerous rounds of training and QC review to reach stability.

With an active learning solution, since the model is continuously learning and improving its predictions, it can leverage its existing training to incorporate the new collection. This prevents the need to “start from scratch.”

With more time savings in model training through active learning, the SME can lend more of their expertise in QC review. In active learning solutions, differences between human coding decisions and model predictions are typically served up in two separate conflicts queues. These queues can be batched out or sampled for SME review. Where the documents in the project are comprised of user-created content and represent multiple concepts, the data set is considered to have a high conceptual richness. This may lead to a higher percentage of documents with features that the predictive coding model does not understand, which then can lead to disparate confidence levels and document populations with low coverage, posing a challenge to training.

The model's understanding of these documents and resulting prediction scores can be improved by training the system on more documents from lower coverage sets. To address this problem, some of today's active learning solutions have coverage queues and visualizations which eliminate the need for complex saved searches to review these sets. The SME can, therefore, easily sample documents from these sets to improve predictions for the greater review team.

With earlier TAR technologies, the SME might have been heavily involved with training the model throughout the life of the project. The newer features of today's active learning solutions can help to alleviate their burden and allow them to have time for other priorities. In providing a lower barrier to implementation, both in time and cost savings, active learning has become a more attractive option for fulfilling the proportionality and reasonableness of review requirements, both for the end client and the SME.

Erin Baksa is a Senior Business Development Manager at Everlaw. Prior to Everlaw, she worked in ediscovery consulting as a Senior Manager for the Forensic Technology Services team at A&M Asia in Hong Kong. Previous consulting firms include Stroz Friedberg and DTI. Erin is a licensed attorney and has worked in the litigation industry for over 10 years.