AI in Discovery: The Future Is Now
Robert J. Burns, Benjamin R. Wilson, Joan M. Washburn write: A well-designed and well-executed process utilizing Continuous Active Learning technology might be what you—and your client, your adversary, and your judge—are seeking to minimize costs, maximize efficiency, and fast-track your case to its more fruitful, and more enjoyable, stages.
November 13, 2017 at 11:00 AM
8 minute read
Few current topics in legal practice generate the extremity that artificial intelligence does. Utopians promise a radical transformation in litigation, with machines doing our dirty work and leaving us to higher endeavors. Dystopians counter that AI will soon make us all superfluous and unemployed.
This article will not adjudicate the AI dispute. Nor will it assure the reader that he or she may continue litigating in the old familiar ways, blissfully unaware of AI It may be viewed as good news, or bad news, but it is a fact: This technology is here, it is transformative, and it is gaining judicial traction. If an adversary, a judge, or a client hasn't already asked you whether AI should be employed in your next discovery process, they will soon. Be prepared.
By now, everyone knows we are surrounded by an ocean of data. Clients generate new data at unprecedented speeds and volumes. But when that data becomes discoverable, prior discovery technologies have only increased the risks of drowning. Studies have shown that keyword searching coupled with linear review is ineffective and extremely costly. Practice has shown that first-generation computer-assisted methodologies (TAR), while more precise, are challenging to implement and still too costly. As a result, despite the initial transformative promise of TAR, e-discovery remains the most dreaded phase of litigation—for counsel doing the work, for judges resolving the disputes, and for clients paying the bills.
But AI technologies now in active deployment—and enjoying the first blush of judicial endorsement—are likely to transform this process in the near future.
|The Past
Since the early 2000s, litigators have already endured several great leaps in discovery practice. In the first stage of evolution, we emerged from dusty file warehouses, and we (or armies of contract lawyers) conducted linear reviews of electronic files from the comfort of our screens. But we soon realized that our clients were generating data faster than our ability to review it meaningfully.
In the second stage, we employed filtering methodologies and keyword searches to cut through the data volume. But those methodologies, too, required linear review of the substantial balance of documents identified. These methodologies also lacked precision, generating numerous false hits. And they triggered competing tensions: plaintiffs' motivation to exhaust all potential sources of discoverable information and defendants' desire to minimize burdens and expenses. The net result was more, not fewer, discovery disputes.
The third stage—TAR, or “predictive coding”—initially seemed the solution. In 2012, the first federal judge endorsed TAR as a viable e-discovery tool, and by 2015, it was “black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.” Rio Tinto PLC v. Vale S.A., 306 F.R.D 125, 127 (S.D.N.Y. 2015). TAR was enthusiastically embraced by the litigation community as an emerging best practice: TAR, we thought, promised improved effectiveness and efficiency in e-discovery, and a greater understanding of the client's data sets.
But over time, it has become clear that TAR is not the hoped-for panacea. Traditional TAR protocols require a substantial amount of expensive, upfront work. That work should be done by senior lawyers who understand the client's data and the factual and legal issues in the case. Such lawyers have high price tags and severe time constraints, but TAR needs them to review rounds of documents to complete the control and seed set, and to repeat this process until stabilization occurs.
TAR presents other issues, too. Its efficacy depends, in significant measure, on cooperation and transparency among counsel. But courts have been reluctant to compel that transparency, and in its absence uncertainty reigns: The propounding party fears that the algorithm was (intentionally or negligently) trained to miss relevant documents, the responding party fears costly “do-overs,” and both parties face costly motion practice to sort this out. These shortcomings have caused some litigants to revert to the old-fashioned techniques that most had thought abandoned.
|The Future (Is Now)
The fourth evolutionary stage is at hand. Continuous Active Learning technology (CAL) allows parties to identify relevant documents with far less effort and cost. Its functionality is simple: Using either key documents or keywords, the technology retrieves ESI, presenting first the documents most likely to be of interest, followed by those less likely to be of interest. The technology continually improves its understanding of what is likely to be relevant. And as that understanding strengthens, the database re-ranks each document in the collection by its likely relevance.
What does this mean in practice? In short, an e-discovery process that more closely parallels the fact-development process we undertake in our cases. Consider this scenario:
A dispute has ripened into litigation, or the threat of it. Your client presents you with a stack of documents that, based on initial conversations with the key players, seem to matter most to the dispute: contracts, demand letters, correspondence, witness statements, and the like. In the old days, you might review those key documents to extract keywords most likely to unearth similar documents. Or, if using first-generation TAR, your review team might use these documents to guide their manual efforts to unearth similar documents and, thereby, train the algorithm. In each instance, humans needed to figure out the story of the case, and then figure out how to tell that story in a language a machine would understand.
But using CAL technology, the machine ingests the fruits of your initial investigation and takes it from there. The database's algorithm performs contextual analysis of your “hot documents” and identifies relevant authors, recipients, and custodians. It identifies the types of documents most likely to be relevant. It searches these materials for words in contextual combinations similar to those in the ingested key documents. And it grows more intelligent as it continues these analyses, with minimal human guidance, ultimately returning a discrete subset of relevant ESI.
This technology requires minimal attorney oversight and minimal refinement. It has no need for recall rates or reliance on seed sets determined through labor-intensive rounds of review. Research has shown that, effectively used, CAL culls higher levels of relevant documents more quickly and with less effort (and therefore cost) than prior e-discovery tools. And, most importantly, CAL comports with the methods litigators actually use to litigate cases: Skilled human lawyers do the initial investigation to determine the story of the case and to identify the key documents telling that story, and machines then dig through data sets to identify all other documents that bear on that story, refining their (and, by extension, the lawyers') “learning” about the case along the way.
This technology now exists and is available for use. And CAL has already received some initial judicial support. Magistrate Judge Andrew Peck, recognized within the Southern District of New York as an expert in e-discovery issues, has twice expressed favorable views regarding CAL as an efficient and effective tool. See Rio Tinto, 306 F.R.D. at 128; Hyles v. New York City, 10 Civ. 3119, 2016 WL 4077114, at *3 (S.D.N.Y. Aug. 1, 2016). We have every confidence that, as CAL technology enters wider use, other courts will join Judge Peck in support.
For early adopters, the scarcity of case law means there is not (yet) a well-defined judicial roadmap for a defensible CAL-based methodology. In the meantime, we offer the following suggestions to maximize your prospects for success:
First, CAL is science, not magic. Upfront issues of preservation and collection remain, and are critical to an effective process. A law firm and its clients must identify, preserve, and collect data sets likely to be relevant, and that data must be converted to structured form in a database. Look to collect and convert a data set with maximum potential to contain relevant information. The more comprehensive the materials are, the smarter the algorithm will be from the outset, netting responsive documents more quickly. Further, a complete and diverse document set will mitigate preservation issues and will ease concerns that key documents were missed.
Second, work closely with technical support staff to develop a CAL methodology that is fully documented, technically sound, and consistent with your client's data systems and architecture. Look also to the well-developed body of case law on TAR defensibility; defensibility concerns in those cases will be relevant to courts in assessing CAL methodologies.
Third, embrace cooperation and transparency. The best defense to any e-discovery methodology is that both parties understand and have agreed to it. Ideally, both parties would compile sets of key documents, and confer to compile the fullest and fairest set of key documents for initial ingestion into the CAL system. In any event, helping your adversary understand the technology, and the manner in which you will implement it, will minimize the risks of motion practice and potentially costly do-overs down the road.
In sum, a well-designed and well-executed process utilizing CAL technology might be what you—and your client, your adversary, and your judge—are seeking to minimize costs, maximize efficiency, and fast-track your case to its more fruitful, and more enjoyable, stages.
Robert J. Burns is a partner and Benjamin R. Wilson is an associate in Holland & Knight's New York office. Joan M. Washburn is the firm's director of litigation e-discovery services.
This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.
To view this content, please continue to their sites.
Not a Lexis Subscriber?
Subscribe Now
Not a Bloomberg Law Subscriber?
Subscribe Now
NOT FOR REPRINT
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.
You Might Like
View AllWhen It Comes to Local Law 97 Compliance, You’ve Gotta Have (Good) Faith
8 minute readFrom ‘Deep Sadness’ to Little Concern, Gaetz’s Nomination Draws Sharp Reaction From Lawyers
7 minute readDeposing Former Mayor Bill de Blasio; Misrepresentations To Induce Investment: This Week in Scott Mollen’s Realty Law Digest
Trending Stories
- 1UN Treaty Enacting Cybercrime Standards Likely to Face Headwinds in US, Other Countries
- 2Clark Hill Acquires L&E Boutique in Mexico City, Adding 5 Lawyers
- 36th Circuit Judges Spar Over Constitutionality of Ohio’s Ballot Initiative Procedures
- 4On The Move: Polsinelli Adds Health Care Litigator in Nashville, Ex-SEC Enforcer Joins BCLP in Atlanta
- 5After Mysterious Parting With Last GC, Photronics Fills Vacancy
Who Got The Work
Michael G. Bongiorno, Andrew Scott Dulberg and Elizabeth E. Driscoll from Wilmer Cutler Pickering Hale and Dorr have stepped in to represent Symbotic Inc., an A.I.-enabled technology platform that focuses on increasing supply chain efficiency, and other defendants in a pending shareholder derivative lawsuit. The case, filed Oct. 2 in Massachusetts District Court by the Brown Law Firm on behalf of Stephen Austen, accuses certain officers and directors of misleading investors in regard to Symbotic's potential for margin growth by failing to disclose that the company was not equipped to timely deploy its systems or manage expenses through project delays. The case, assigned to U.S. District Judge Nathaniel M. Gorton, is 1:24-cv-12522, Austen v. Cohen et al.
Who Got The Work
Edmund Polubinski and Marie Killmond of Davis Polk & Wardwell have entered appearances for data platform software development company MongoDB and other defendants in a pending shareholder derivative lawsuit. The action, filed Oct. 7 in New York Southern District Court by the Brown Law Firm, accuses the company's directors and/or officers of falsely expressing confidence in the company’s restructuring of its sales incentive plan and downplaying the severity of decreases in its upfront commitments. The case is 1:24-cv-07594, Roy v. Ittycheria et al.
Who Got The Work
Amy O. Bruchs and Kurt F. Ellison of Michael Best & Friedrich have entered appearances for Epic Systems Corp. in a pending employment discrimination lawsuit. The suit was filed Sept. 7 in Wisconsin Western District Court by Levine Eisberner LLC and Siri & Glimstad on behalf of a project manager who claims that he was wrongfully terminated after applying for a religious exemption to the defendant's COVID-19 vaccine mandate. The case, assigned to U.S. Magistrate Judge Anita Marie Boor, is 3:24-cv-00630, Secker, Nathan v. Epic Systems Corporation.
Who Got The Work
David X. Sullivan, Thomas J. Finn and Gregory A. Hall from McCarter & English have entered appearances for Sunrun Installation Services in a pending civil rights lawsuit. The complaint was filed Sept. 4 in Connecticut District Court by attorney Robert M. Berke on behalf of former employee George Edward Steins, who was arrested and charged with employing an unregistered home improvement salesperson. The complaint alleges that had Sunrun informed the Connecticut Department of Consumer Protection that the plaintiff's employment had ended in 2017 and that he no longer held Sunrun's home improvement contractor license, he would not have been hit with charges, which were dismissed in May 2024. The case, assigned to U.S. District Judge Jeffrey A. Meyer, is 3:24-cv-01423, Steins v. Sunrun, Inc. et al.
Who Got The Work
Greenberg Traurig shareholder Joshua L. Raskin has entered an appearance for boohoo.com UK Ltd. in a pending patent infringement lawsuit. The suit, filed Sept. 3 in Texas Eastern District Court by Rozier Hardt McDonough on behalf of Alto Dynamics, asserts five patents related to an online shopping platform. The case, assigned to U.S. District Judge Rodney Gilstrap, is 2:24-cv-00719, Alto Dynamics, LLC v. boohoo.com UK Limited.
Featured Firms
Law Offices of Gary Martin Hays & Associates, P.C.
(470) 294-1674
Law Offices of Mark E. Salomone
(857) 444-6468
Smith & Hassler
(713) 739-1250