Review acceleration: Getting the most from technology-assisted review
Large-scale document review poses significant challenges to an organization. A number of factors are responsible, but they generally relate to what I call the document review conundrum the constant challenge of dealing with large amounts of electronically stored information, with ever-increasing volumes and ever-decreasing budgets and time.
November 07, 2012 at 03:30 AM
11 minute read
The original version of this story was published on Law.com
Large-scale document review poses significant challenges to an organization. A number of factors are responsible, but they generally relate to what I call the document review conundrum – the constant challenge of dealing with large amounts of electronically stored information, with ever-increasing volumes and ever-decreasing budgets and time. Worse still, the risks of making a misstep are very significant. The good news is a solution is finally on the horizon, in the form of technology-assisted review. But as one problem is addressed, another is created in kind. Today, many are attempting to understand the difference between the different forms of this emerging technology category and to learn how they can get the most out of their investment. This article will address these issues head on.
Understanding technology-assisted review
Technology-assisted review is quickly emerging as a viable and important solution in e-discovery because it helps streamline the process of identifying potentially relevant data within a large document collection, delivering accuracy as good as (or better than) traditional manual review, while reducing time and cost by as much as 70 to 80 percent.
Not all technology-assisted review is the same, however. There are two broad sub-categories available in the market, each with clear strengths:
- An artificial intelligence-based approach that leverages computer intelligence to identify responsive data. This approach generally achieves initial results more quickly, but isn't particularly transparent given the complex mathematics that underpin the technology.
- A language-based approach that relies on a human's understanding of language to identify responsive data. This approach works as humans think and presents time and cost savings similar to the artificial intelligence approach with a higher degree of transparency, auditability and work product reusability (from one matter to the next).
Regardless of which option you choose, it's important that you take every precaution to get the most out of your investment. Note: In some cases, you can also apply a combination of the two to take advantage of the benefits inherent to each approach.
Best practices for artificial intelligence-based methodologies
To understand how to get the most out of the artificial intelligence-based alternative, it's first important to understand seed sets. This methodology works by asking a human reviewer to identify a sample set of documents in the collection that contain the concepts that she is looking for. Once the reviewer provides this seed set, the organization can use artificial intelligence to go look for “more like this.”
If the reviewer provides adequate training, the system works correctly. If she does not, however, the results can be sub-optimal, even when compared to manual review. For instance, it is rare that a human will similarly mistag 5,000 documents when reviewing one at a time, but the converse can certainly happen with artificial intelligence.
Thus, a best practice is to build a large enough seed set to cover all your bases. A few years ago, the common practice was to select as few as 500 documents to provide this training. With data volumes increasing and greater education available on how artificial intelligence works, more organizations are building significantly larger seed sets (generally 10,000 or more documents). Why? Because computers do not understand context as humans do. For instance, a computer doesn't understand that “Mitt is running for President” means the same thing as “Mitt threw his hat in the ring.” It is therefore important to capture all concepts and semantic patterns up front to maximize the chances that the computer catches as many instances of the key expressions as possible. You'll also want to quality control the results to monitor/limit what the computer may miss.
Best practices for language-based analytics methodologies
Again, language-based methodologies can deliver time and cost savings similar to the artificial intelligence approach, with the added benefits of transparency, auditability and reusability, but only if you use them correctly.
Because you are not leveraging artificial intelligence to help you identify what you are looking for, turn the process around and first identify what clearly isn't relevant. You can accomplish tjis by analyzing the language within the collection, and setting up a workflow that allows you to put documents into three buckets: documents that are clearly relevant, documents that clearly are not relevant and those that require additional analysis. Often, you'll find that at least 40 percent of the collection is clearly not relevant based on your language analysis, and therefore you can quickly and defensibly set aside these documents for significant time efficiencies.
Once you have set aside those documents that clearly aren't relevant, then go through the remainder (i.e., the “might be relevant” pile) and have your reviewer highlight the language within the document that she feels makes the document relevant. This will provide you with added insight into how well the review team understands the issue at hand, and will allow you to make real-time adjustments as necessary.
Best practices for either alternative
Regardless of the technology-assisted review approach you use, keep in mind that these are powerful methodologies, but they can also deliver powerfully bad results (yielding risk and hidden costs) if not used properly. An advanced fighter jet can be effective when flown by a trained pilot, but in the hands of someone with only marginal training, it can crash right into the broad side of a mountain.
If you do not have the in-house resources (technology, infrastructure and people) or training processes in place to ensure optimal results, look for an outside resource to provide assistance and expert counsel. Find someone with deep domain, technology and process expertise—ideally someone who supports both artificial intelligence-based and language-based methodologies so he can compare and contrast each and provide guidance to you on when to use which alternative.
With these best practices, and supporting case law now coming to fruition, technology-assisted review can significantly accelerate the review process, and deliver cost and time savings with as much accuracy and defensibility as the traditional manual review process.
Large-scale document review poses significant challenges to an organization. A number of factors are responsible, but they generally relate to what I call the document review conundrum – the constant challenge of dealing with large amounts of electronically stored information, with ever-increasing volumes and ever-decreasing budgets and time. Worse still, the risks of making a misstep are very significant. The good news is a solution is finally on the horizon, in the form of technology-assisted review. But as one problem is addressed, another is created in kind. Today, many are attempting to understand the difference between the different forms of this emerging technology category and to learn how they can get the most out of their investment. This article will address these issues head on.
Understanding technology-assisted review
Technology-assisted review is quickly emerging as a viable and important solution in e-discovery because it helps streamline the process of identifying potentially relevant data within a large document collection, delivering accuracy as good as (or better than) traditional manual review, while reducing time and cost by as much as 70 to 80 percent.
Not all technology-assisted review is the same, however. There are two broad sub-categories available in the market, each with clear strengths:
- An artificial intelligence-based approach that leverages computer intelligence to identify responsive data. This approach generally achieves initial results more quickly, but isn't particularly transparent given the complex mathematics that underpin the technology.
- A language-based approach that relies on a human's understanding of language to identify responsive data. This approach works as humans think and presents time and cost savings similar to the artificial intelligence approach with a higher degree of transparency, auditability and work product reusability (from one matter to the next).
Regardless of which option you choose, it's important that you take every precaution to get the most out of your investment. Note: In some cases, you can also apply a combination of the two to take advantage of the benefits inherent to each approach.
Best practices for artificial intelligence-based methodologies
To understand how to get the most out of the artificial intelligence-based alternative, it's first important to understand seed sets. This methodology works by asking a human reviewer to identify a sample set of documents in the collection that contain the concepts that she is looking for. Once the reviewer provides this seed set, the organization can use artificial intelligence to go look for “more like this.”
If the reviewer provides adequate training, the system works correctly. If she does not, however, the results can be sub-optimal, even when compared to manual review. For instance, it is rare that a human will similarly mistag 5,000 documents when reviewing one at a time, but the converse can certainly happen with artificial intelligence.
Thus, a best practice is to build a large enough seed set to cover all your bases. A few years ago, the common practice was to select as few as 500 documents to provide this training. With data volumes increasing and greater education available on how artificial intelligence works, more organizations are building significantly larger seed sets (generally 10,000 or more documents). Why? Because computers do not understand context as humans do. For instance, a computer doesn't understand that “Mitt is running for President” means the same thing as “Mitt threw his hat in the ring.” It is therefore important to capture all concepts and semantic patterns up front to maximize the chances that the computer catches as many instances of the key expressions as possible. You'll also want to quality control the results to monitor/limit what the computer may miss.
Best practices for language-based analytics methodologies
Again, language-based methodologies can deliver time and cost savings similar to the artificial intelligence approach, with the added benefits of transparency, auditability and reusability, but only if you use them correctly.
Because you are not leveraging artificial intelligence to help you identify what you are looking for, turn the process around and first identify what clearly isn't relevant. You can accomplish tjis by analyzing the language within the collection, and setting up a workflow that allows you to put documents into three buckets: documents that are clearly relevant, documents that clearly are not relevant and those that require additional analysis. Often, you'll find that at least 40 percent of the collection is clearly not relevant based on your language analysis, and therefore you can quickly and defensibly set aside these documents for significant time efficiencies.
Once you have set aside those documents that clearly aren't relevant, then go through the remainder (i.e., the “might be relevant” pile) and have your reviewer highlight the language within the document that she feels makes the document relevant. This will provide you with added insight into how well the review team understands the issue at hand, and will allow you to make real-time adjustments as necessary.
Best practices for either alternative
Regardless of the technology-assisted review approach you use, keep in mind that these are powerful methodologies, but they can also deliver powerfully bad results (yielding risk and hidden costs) if not used properly. An advanced fighter jet can be effective when flown by a trained pilot, but in the hands of someone with only marginal training, it can crash right into the broad side of a mountain.
If you do not have the in-house resources (technology, infrastructure and people) or training processes in place to ensure optimal results, look for an outside resource to provide assistance and expert counsel. Find someone with deep domain, technology and process expertise—ideally someone who supports both artificial intelligence-based and language-based methodologies so he can compare and contrast each and provide guidance to you on when to use which alternative.
With these best practices, and supporting case law now coming to fruition, technology-assisted review can significantly accelerate the review process, and deliver cost and time savings with as much accuracy and defensibility as the traditional manual review process.
This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.
To view this content, please continue to their sites.
Not a Lexis Subscriber?
Subscribe Now
Not a Bloomberg Law Subscriber?
Subscribe Now
NOT FOR REPRINT
© 2025 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.
You Might Like
View AllFired by Trump, EEOC's First Blind GC Lands at Nonprofit Targeting Abuses of Power
3 minute readKeys to Maximizing Efficiency (and Vibes) When Navigating International Trade Compliance Crosschecks
6 minute readLSU General Counsel Quits Amid Fracas Over First Amendment Rights of Law Professor
7 minute readTrending Stories
- 128 Firms Supporting Retired Barnes & Thornburg Litigator in Georgia Supreme Court Malpractice Case
- 2Boosting Litigation and Employee Benefits Practices, Two Am Law 100 Firms Grow in Pittsburgh
- 3EMT Qualifies as 'Health Care Provider' Under Whistleblower Act, State Appellate Court Rules
- 4Bar Report - Feb. 3
- 5Was $1.3M in 'Incentive' Payments Commission? NJ Justices Weigh Arguments
Who Got The Work
J. Brugh Lower of Gibbons has entered an appearance for industrial equipment supplier Devco Corporation in a pending trademark infringement lawsuit. The suit, accusing the defendant of selling knock-off Graco products, was filed Dec. 18 in New Jersey District Court by Rivkin Radler on behalf of Graco Inc. and Graco Minnesota. The case, assigned to U.S. District Judge Zahid N. Quraishi, is 3:24-cv-11294, Graco Inc. et al v. Devco Corporation.
Who Got The Work
Rebecca Maller-Stein and Kent A. Yalowitz of Arnold & Porter Kaye Scholer have entered their appearances for Hanaco Venture Capital and its executives, Lior Prosor and David Frankel, in a pending securities lawsuit. The action, filed on Dec. 24 in New York Southern District Court by Zell, Aron & Co. on behalf of Goldeneye Advisors, accuses the defendants of negligently and fraudulently managing the plaintiff's $1 million investment. The case, assigned to U.S. District Judge Vernon S. Broderick, is 1:24-cv-09918, Goldeneye Advisors, LLC v. Hanaco Venture Capital, Ltd. et al.
Who Got The Work
Attorneys from A&O Shearman has stepped in as defense counsel for Toronto-Dominion Bank and other defendants in a pending securities class action. The suit, filed Dec. 11 in New York Southern District Court by Bleichmar Fonti & Auld, accuses the defendants of concealing the bank's 'pervasive' deficiencies in regards to its compliance with the Bank Secrecy Act and the quality of its anti-money laundering controls. The case, assigned to U.S. District Judge Arun Subramanian, is 1:24-cv-09445, Gonzalez v. The Toronto-Dominion Bank et al.
Who Got The Work
Crown Castle International, a Pennsylvania company providing shared communications infrastructure, has turned to Luke D. Wolf of Gordon Rees Scully Mansukhani to fend off a pending breach-of-contract lawsuit. The court action, filed Nov. 25 in Michigan Eastern District Court by Hooper Hathaway PC on behalf of The Town Residences LLC, accuses Crown Castle of failing to transfer approximately $30,000 in utility payments from T-Mobile in breach of a roof-top lease and assignment agreement. The case, assigned to U.S. District Judge Susan K. Declercq, is 2:24-cv-13131, The Town Residences LLC v. T-Mobile US, Inc. et al.
Who Got The Work
Wilfred P. Coronato and Daniel M. Schwartz of McCarter & English have stepped in as defense counsel to Electrolux Home Products Inc. in a pending product liability lawsuit. The court action, filed Nov. 26 in New York Eastern District Court by Poulos Lopiccolo PC and Nagel Rice LLP on behalf of David Stern, alleges that the defendant's refrigerators’ drawers and shelving repeatedly break and fall apart within months after purchase. The case, assigned to U.S. District Judge Joan M. Azrack, is 2:24-cv-08204, Stern v. Electrolux Home Products, Inc.
Featured Firms
Law Offices of Gary Martin Hays & Associates, P.C.
(470) 294-1674
Law Offices of Mark E. Salomone
(857) 444-6468
Smith & Hassler
(713) 739-1250