The commercial launch of ChatGPT in late 2022 sparked a resurgence of interest in the potential for artificial intelligence to automate, optimize and revolutionize traditional processes in virtually every area of business. Enthusiasm for AI and large language models underscores a pivotal moment for legal technology in general and eDiscovery in particular. While the legal industry moved slowly in accepting "early" technology-assisted review workflows, interest in adopting generative AI is strong. A recent survey about the uptake of AI in legal environments reveals a robust appetite for AI implementation to optimize legal operations. Key findings include:

  • A majority (60%) of respondents see AI as a liberator of time for higher-value work.
  • Over half (55%) believe AI boosts in-house legal team productivity.
  • Automation of manual tasks, seen by 41% as a significant productivity boon, underscores AI's potential to transform tedious aspects of legal work.
  • 49% of professionals view AI as a tool for simplifying complex, error-prone processes.

As we embrace a new generation of AI-powered tools and workflows, the leading practices discovery professionals have established for measuring quality and pairing the right tools with the job will be valuable to the broader legal community. Recalling our path to adoption with "first generation" AI solutions can help us confidently adopt modern AI solutions.

A Decade of AI in eDiscovery to Build On

By the Text Retrieval Conferences of the late 2000s, the numbers had proven that technology-assisted review could be more effective and more efficient than unassisted human review. Since the early 2010s, the use of TAR has been recognized as black-letter law. Fast-forward to the 2020s, and we have different flavors of TAR—predictive coding (TAR 1.0), active learning (TAR 2.0), and multiple enhanced "TAR 3.0" solutions. Our experience with TAR protocols gives us a strong foundation to support the evaluation and integration of new technology. Here are some lessons to keep in mind:

  1. Understand the technology and the terminology. Just as different varieties of TAR are optimal in different situations, there are different kinds of generative AI and large language models that are optimal for different uses. To take LLMs as an example: a GPT LLM that supports direct queries of a document set might not be the same completion LLM used to generate a summary, or the LLMl used to categorize a document. And none of these is the same as the embedding LLM whose vectors might be used to improve upon the performance of SVM-enabled TAR solutions. And no matter the use case, any LLM will likely be augmented in some fashion—by a different LLM or a different technology. We are still working toward a common vocabulary for talking about AI, so when discussing AI solutions, it's important to take a moment to clarify terms of art and confirm a shared understanding.
  1. Know the goals. It's important to be purposeful when integrating AI. We know that when choosing between TAR 1.0 and TAR 2.0 solutions, we must consider time pressure, availability of SMEs, and whether data is immediately available or will roll in. When we consider using modern AI, we need to think through our goals as well: Are we seeking to learn new facts? Are we investigating to inform case strategy? Or are we attempting to exhaustively review a set of documents for production? Our answers will help us pair the right modern AI with the task. As we begin a new era, our purpose might simply be to gain experience and comfort with generative AI-derived insights. Injecting AI into a well-established workflow just to see how it compares to traditional processes may be a valuable use of time and resources. For example, providing AI-automated responsiveness calls or document summaries in a workflow alongside familiar predictive scores can provide comfort with GenAI insights and foster practical understanding of where GenAI provides a measurable advantage.
  1. Measure, measure, measure. At this early stage it's crucial to measure not only the accuracy of the outputs of generative AI solution, but also the effort and costs of achieving those results. The potential of GenAI-powered solutions is real, but we are still honing our skills. LLMs have generated impressive results in early use for document categorization, so there's reason to expect it to eventually automate much (if not all) of traditional first-pass review. Live use still requires robust QC. And, depending on the topics and categories, it might take model tuning and multiple rounds of processing to achieve acceptable quality. Comparing AI to previous technologies or human benchmarks necessitates a clear definition of what constitutes better performance. This evaluation should consider accuracy, efficiency, cost-effectiveness, and even environmental impact.

A final note on measurement: the law has embraced precision and recall as standard metrics to quantify the accuracy of any information retrieval process, and now statistically valid measurement is supported by most review platforms. What's new with many generative AI solutions currently under development, testing, and early implementation is the introduction of automatically generated summaries and reasons. We don't yet have standards for qualitative measurement of these subjective insights. Academic language technology has generally accepted methods for this, but the eDiscovery and broader legal community will need to determine how we'll measure quality for our specific purposes.