Are People the Weak Link in Technology-Assisted Review?
The weak link preventing technology-assisted review (TAR) from achieving its true potential is a lack of clarity surrounding the technology—the components, the development and the distinctions.
February 01, 2019 at 12:53 PM
8 minute read
In a word, yes. But it's not what you might think.
The weak link preventing technology-assisted review (TAR) from achieving its true potential is a lack of clarity surrounding the technology—the components, the development and the distinctions. No doubt, TAR is seeing greater acceptance and refinement in the legal space. But with a deeper understanding of the technology, TAR can be even more useful and effective.
Understanding the Technology
To start, TAR is a process by which reviewers code documents for some target criteria (e.g., responsiveness), and an algorithm uses those coding decisions to efficiently manage the review of the unseen documents—known as “supervised machine learning.” Some TAR processes manage review by categorizing the remaining documents, others manage by ranking the collection. Either way, the goal is to effectively train the algorithm and minimize the number of documents that need to be reviewed to achieve recall objectives for the target criteria.
If coding decisions are not being used to train the algorithm (known as “unsupervised machine learning”), the process simply is not a TAR process. Therefore, while clustering, near-duplicate analysis and email threading all use technology to aid in the review process, they are not TAR for purposes of this discussion.
A true TAR application has three layers. The base layer consists of feature extraction, where the documents are decomposed into the elements, or “features,” that will be used by the algorithm to evaluate coding decisions, and compare and make decisions about unreviewed documents. On top of feature extraction sits the supervised machine learning algorithm layer. And the entire TAR operation is directed by the “process” layer, which controls all aspects of the training protocol.
Contemporary feature extraction techniques typically focus on the text in the body of individual documents. Features most often consist of individual words or word fragments. However, expanding the feature set to include two- and three-word segments has been found to improve performance. Conversely, feature reduction techniques such as latent semantic indexing, which consolidate multiple words into a single proxy feature, have been shown to degrade performance with most TAR algorithms.
As research and development continue, the feature extraction layer is likely to see expansion beyond the body text, and continued refinement to improve TAR efficiency. See, e.g., Jones, Amanda, et al., ”The Role of Metadata in Machine Learning for Technology Assisted Review,” DESI VI Workshop, June 8, 2015.
At the next level, the consistent emphasis on identifying the specific TAR algorithm is a prime example of the educational weak link that inhibits progress. With a few exceptions, the supervised machine learning algorithms used in TAR applications, all other things being equal, will see somewhat equivalent results. Whether it's SVM (support vector machine), logistic regression, Naïve Bayes or even a proprietary algorithm, operational differences typically do not depend on the specific TAR algorithm being used.
Certainly, however, there are a few exceptions. The 1-nearest neighbor algorithm has been shown to be somewhat ineffective in e-discovery review applications. And there is simply not enough training data to take advantage of deep learning algorithms in e-discovery. Conversely, incorporating reinforcement learning may well improve the effectiveness of a TAR algorithm.
As an aside to clarify messaging, the fact that TAR applications rely on supervised machine learning algorithms means that TAR is, by definition, using artificial intelligence or AI, since supervised machine learning is indeed one form of AI.
Differentiating TAR 1.0 from TAR 2.0
Perhaps the most significant distinction between TAR applications is found at the process layer, which can be broken down into two principal categories that are most often referred to as TAR 1.0 and TAR 2.0. The primary distinction stems from the protocol for training the algorithm.
In a TAR 1.0 application, documents are reviewed and coded to train the algorithm only until either the algorithm shows no further improvement (referred to as stabilization); or the production metrics of recall and precision appear to be sufficient, typically by reference to a random, representative control set designed to monitor progress. Training usually consists of a few thousand documents. The algorithm will then automatically classify the remaining documents or, alternatively, rank them to facilitate a manual classification. Once classified, the presumptively positive documents may or may not be reviewed and coded, but will not further train the algorithm.
TAR 1.0 applications can be further divided into simple passive learning (SPL) and simple active learning (SAL) protocols, depending upon the manner in which training documents are selected. With an SPL protocol, training documents are selected at random. The protocol is “simple” because there is a discrete training phase, after which training ceases regardless of further coding. It is passive because the algorithm does not select the random training documents. With a SAL, protocol, the algorithm typically selects training documents from those about which the algorithm is the least certain. This is known as “uncertainty sampling,” and it is considered an active protocol because the algorithm actively selects the training documents.
With TAR 2.0, documents are continuously reviewed and coded to train the algorithm until enough positive documents have been located, reviewed, and coded to achieve production objectives. Training documents are primarily selected through relevance feedback, which focuses on documents the algorithm sees as most likely to be positive. This protocol is called continuous active learning (CAL). The protocol is “continuous” because every coding decision is used to train the algorithm. And again, it is active because the algorithm actively selects the training documents. This is typically accomplished by ranking the entire collection so the most likely positive documents at the top can be reviewed first.
Studies show that CAL (TAR 2.0) is typically more efficient than either TAR 1.0 protocol when the presumptively-positive (e.g., responsive) documents will be reviewed. That is simply because, while TAR 1.0 training is very limited, the resultant presumptively-positive set contains more negative documents than would be reviewed with CAL.
CAL also overcomes many of the practical obstacles to adoption that are inherent in the operation of TAR 1.0. A control set is not required, making it easier to handle rolling collections. There is no need for a subject matter expert (SME) to train the algorithm to avoid propagating erroneous decisions—CAL is noise tolerant, and our studies have shown that contract review attorneys train the algorithm as well as, and in some cases better than, an SME. Eliminating the SME also means that document review can start immediately, rather than waiting for an SME to code the control set and the training set. And the review can focus on the documents most likely to be positive (i.e., the best or most relevant documents), rather than the random or uncertain documents used to train TAR 1.0 applications.
Advances at the process level are most likely to come from operational refinements and workflow improvements to the CAL protocol. For example, studies show that more frequent ranking tends to improve CAL efficiency. And, since TAR operates at the document level, eliminating family batching will reduce the number of negative documents reviewed. J. Pickens, et al. “Break up the Family: Protocols for Efficient Recall-Oriented Retrieval Under Legally-Necessitated Dual Constraints.” Proceedings of the Second Annual Workshop on Big Data Analytics in the Legal Industry, IEEE Big Data 2018 (Seattle).
Advancing Legal Application
TAR is certainly moving in the direction of greater acceptance by the judiciary. Indeed, the court in Winfield v. City of New York, 2017 WL 5664852 (S.D.N.Y. 2017) essentially directed the use of TAR to improve the pace of discovery. And the New York Commercial Division adopted as Rule 11-e(f) the goal of using the most efficient review techniques, expressly including TAR. This trend will only continue as ESI collections grow, technical familiarity with TAR improves, and proportionality considerations prescribe efficiency.
Courts are necessarily refining the boundaries of cooperation and transparency surrounding TAR protocols, with particular emphasis on demonstrable production deficiencies. See, Entrata v. Yardi Systems, No. 2:15-cv-00102 (D. Utah 2018) (rejecting a post hoc demand for sweeping disclosures); Winfield (directing production of a sample of nonresponsive documents to “increase transparency”).
As parties become more sophisticated, there is a greater emphasis on the negotiation and use of TAR protocols in litigation. These protocols can be very comprehensive, addressing a wide range of issues such as keyword culling procedures, transparency obligations, and validation parameters. See, In Re Broiler Chicken Antitrust Litigation, No. 1:16-cv-08637 (N.D. Ill.) (No. 586).
Sophisticated parties are also taking maximum advantage of TAR techniques both inside and outside the courthouse. When comprehensive review may be unnecessary, such as second requests and subpoena responses, respondents may resort to TAR 1.0 protocols. Conversely, given that review begins immediately, CAL protocols are expanding into early case assessment, investigations and compliance monitoring.
Ultimately, with a clear understanding of the technology, TAR promises to see increasing utility, and significantly enhance document review on any number of fronts. Technological advances and workflow optimization will incrementally improve TAR efficiencies. And knowledgeable innovation will lead to ever-expanding application opportunities.
Thomas Gricks is managing director, Professional Services, Catalyst. Gricks advises corporations and law firms on best practices for applying TAR technology.
This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.
To view this content, please continue to their sites.
Not a Lexis Subscriber?
Subscribe Now
Not a Bloomberg Law Subscriber?
Subscribe Now
NOT FOR REPRINT
© 2025 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.
You Might Like
View AllPa. Federal District Courts Reach Full Complement Following Latest Confirmation
The Defense Bar Is Feeling the Strain: Busy Med Mal Trial Schedules Might Be Phila.'s 'New Normal'
7 minute readFederal Judge Allows Elderly Woman's Consumer Protection Suit to Proceed Against Citizens Bank
5 minute readJudge Leaves Statute of Limitations Question in Injury Crash Suit for a Jury
4 minute readTrending Stories
- 15th Circuit Considers Challenge to Louisiana's Ten Commandments Law
- 2Crocs Accused of Padding Revenue With Channel-Stuffing HEYDUDE Shoes
- 3E-discovery Practitioners Are Racing to Adapt to Social Media’s Evolving Landscape
- 4The Law Firm Disrupted: For Office Policies, Big Law Has Its Ear to the Market, Not to Trump
- 5FTC Finalizes Child Online Privacy Rule Updates, But Ferguson Eyes Further Changes
Who Got The Work
J. Brugh Lower of Gibbons has entered an appearance for industrial equipment supplier Devco Corporation in a pending trademark infringement lawsuit. The suit, accusing the defendant of selling knock-off Graco products, was filed Dec. 18 in New Jersey District Court by Rivkin Radler on behalf of Graco Inc. and Graco Minnesota. The case, assigned to U.S. District Judge Zahid N. Quraishi, is 3:24-cv-11294, Graco Inc. et al v. Devco Corporation.
Who Got The Work
Rebecca Maller-Stein and Kent A. Yalowitz of Arnold & Porter Kaye Scholer have entered their appearances for Hanaco Venture Capital and its executives, Lior Prosor and David Frankel, in a pending securities lawsuit. The action, filed on Dec. 24 in New York Southern District Court by Zell, Aron & Co. on behalf of Goldeneye Advisors, accuses the defendants of negligently and fraudulently managing the plaintiff's $1 million investment. The case, assigned to U.S. District Judge Vernon S. Broderick, is 1:24-cv-09918, Goldeneye Advisors, LLC v. Hanaco Venture Capital, Ltd. et al.
Who Got The Work
Attorneys from A&O Shearman has stepped in as defense counsel for Toronto-Dominion Bank and other defendants in a pending securities class action. The suit, filed Dec. 11 in New York Southern District Court by Bleichmar Fonti & Auld, accuses the defendants of concealing the bank's 'pervasive' deficiencies in regards to its compliance with the Bank Secrecy Act and the quality of its anti-money laundering controls. The case, assigned to U.S. District Judge Arun Subramanian, is 1:24-cv-09445, Gonzalez v. The Toronto-Dominion Bank et al.
Who Got The Work
Crown Castle International, a Pennsylvania company providing shared communications infrastructure, has turned to Luke D. Wolf of Gordon Rees Scully Mansukhani to fend off a pending breach-of-contract lawsuit. The court action, filed Nov. 25 in Michigan Eastern District Court by Hooper Hathaway PC on behalf of The Town Residences LLC, accuses Crown Castle of failing to transfer approximately $30,000 in utility payments from T-Mobile in breach of a roof-top lease and assignment agreement. The case, assigned to U.S. District Judge Susan K. Declercq, is 2:24-cv-13131, The Town Residences LLC v. T-Mobile US, Inc. et al.
Who Got The Work
Wilfred P. Coronato and Daniel M. Schwartz of McCarter & English have stepped in as defense counsel to Electrolux Home Products Inc. in a pending product liability lawsuit. The court action, filed Nov. 26 in New York Eastern District Court by Poulos Lopiccolo PC and Nagel Rice LLP on behalf of David Stern, alleges that the defendant's refrigerators’ drawers and shelving repeatedly break and fall apart within months after purchase. The case, assigned to U.S. District Judge Joan M. Azrack, is 2:24-cv-08204, Stern v. Electrolux Home Products, Inc.