Large-scale document review poses significant challenges to an organization. A number of factors are responsible, but they generally relate to what I call the document review conundrum – the constant challenge of dealing with large amounts of electronically stored information, with ever-increasing volumes and ever-decreasing budgets and time. Worse still, the risks of making a misstep are very significant. The good news is a solution is finally on the horizon, in the form of technology-assisted review. But as one problem is addressed, another is created in kind. Today, many are attempting to understand the difference between the different forms of this emerging technology category and to learn how they can get the most out of their investment. This article will address these issues head on.

Understanding technology-assisted review

Technology-assisted review is quickly emerging as a viable and important solution in e-discovery because it helps streamline the process of identifying potentially relevant data within a large document collection,  delivering accuracy as good as (or better than) traditional manual review, while reducing time and cost by as much as 70 to 80 percent.

Not all technology-assisted review is the same, however. There are two broad sub-categories available in the market, each with clear strengths:

  1. An artificial intelligence-based approach that leverages computer intelligence to identify responsive data. This approach generally achieves initial results more quickly, but isn't particularly transparent given the complex mathematics that underpin the technology.
  2. A language-based approach that relies on a human's understanding of language to identify responsive data. This approach works as humans think and presents time and cost savings similar to the artificial intelligence approach with a higher degree of transparency, auditability and work product reusability (from one matter to the next).

Regardless of which option you choose, it's important that you take every precaution to get the most out of your investment. Note: In some cases, you can also apply a combination of the two to take advantage of the benefits inherent to each approach.

Best practices for artificial intelligence-based methodologies

To understand how to get the most out of the artificial intelligence-based alternative, it's first important to understand seed sets. This methodology works by asking a human reviewer to identify a sample set of documents in the collection that contain the concepts that she is looking for. Once the reviewer provides this seed set, the organization can use artificial intelligence to go look for “more like this.”

If the reviewer provides adequate training, the system works correctly. If she does not, however, the results can be sub-optimal, even when compared to manual review. For instance, it is rare that a human will similarly mistag 5,000 documents when reviewing one at a time, but the converse can certainly happen with artificial intelligence.

Thus, a best practice is to build a large enough seed set to cover all your bases. A few years ago, the common practice was to select as few as 500 documents to provide this training. With data volumes increasing and greater education available on how artificial intelligence works, more organizations are building significantly larger seed sets (generally 10,000 or more documents). Why? Because computers do not understand context as humans do. For instance, a computer doesn't understand that “Mitt is running for President” means the same thing as “Mitt threw his hat in the ring.” It is therefore important to capture all concepts and semantic patterns up front to maximize the chances that the computer catches as many instances of the key expressions as possible. You'll also want to quality control the results to monitor/limit what the computer may miss.

Best practices for language-based analytics methodologies

Again, language-based methodologies can deliver time and cost savings similar to the artificial intelligence approach, with the added benefits of transparency, auditability and reusability, but only if you use them correctly.

Because you are not leveraging artificial intelligence to help you identify what you are looking for, turn the process around and first identify what clearly isn't relevant. You can accomplish tjis by analyzing the language within the collection, and setting up a workflow that allows you to put documents into three buckets: documents that are clearly relevant, documents that clearly are not relevant and those that require additional analysis. Often, you'll find that at least 40 percent of the collection is clearly not relevant based on your language analysis, and therefore you can quickly and defensibly set aside these documents for significant time efficiencies.

Once you have set aside those documents that clearly aren't relevant, then go through the remainder (i.e., the “might be relevant” pile) and have your reviewer highlight the language within the document that she feels makes the document relevant. This will provide you with added insight into how well the review team understands the issue at hand, and will allow you to make real-time adjustments as necessary.

Best practices for either alternative

Regardless of the technology-assisted review approach you use, keep in mind that these are powerful methodologies, but they can also deliver powerfully bad results (yielding risk and hidden costs) if not used properly. An advanced fighter jet can be effective when flown by a trained pilot, but in the hands of someone with only marginal training, it can crash right into the broad side of a mountain.

If you do not have the in-house resources (technology, infrastructure and people) or training processes in place to ensure optimal results, look for an outside resource to provide assistance and expert counsel.  Find someone with deep domain, technology and process expertise—ideally someone who supports both artificial intelligence-based and language-based methodologies so he can compare and contrast each and provide guidance to you on when to use which alternative.

With these best practices, and supporting case law now coming to fruition, technology-assisted review can significantly accelerate the review process, and deliver cost and time savings with as much accuracy and defensibility as the traditional manual review process.

Large-scale document review poses significant challenges to an organization. A number of factors are responsible, but they generally relate to what I call the document review conundrum – the constant challenge of dealing with large amounts of electronically stored information, with ever-increasing volumes and ever-decreasing budgets and time. Worse still, the risks of making a misstep are very significant. The good news is a solution is finally on the horizon, in the form of technology-assisted review. But as one problem is addressed, another is created in kind. Today, many are attempting to understand the difference between the different forms of this emerging technology category and to learn how they can get the most out of their investment. This article will address these issues head on.

Understanding technology-assisted review

Technology-assisted review is quickly emerging as a viable and important solution in e-discovery because it helps streamline the process of identifying potentially relevant data within a large document collection,  delivering accuracy as good as (or better than) traditional manual review, while reducing time and cost by as much as 70 to 80 percent.

Not all technology-assisted review is the same, however. There are two broad sub-categories available in the market, each with clear strengths:

  1. An artificial intelligence-based approach that leverages computer intelligence to identify responsive data. This approach generally achieves initial results more quickly, but isn't particularly transparent given the complex mathematics that underpin the technology.
  2. A language-based approach that relies on a human's understanding of language to identify responsive data. This approach works as humans think and presents time and cost savings similar to the artificial intelligence approach with a higher degree of transparency, auditability and work product reusability (from one matter to the next).

Regardless of which option you choose, it's important that you take every precaution to get the most out of your investment. Note: In some cases, you can also apply a combination of the two to take advantage of the benefits inherent to each approach.

Best practices for artificial intelligence-based methodologies

To understand how to get the most out of the artificial intelligence-based alternative, it's first important to understand seed sets. This methodology works by asking a human reviewer to identify a sample set of documents in the collection that contain the concepts that she is looking for. Once the reviewer provides this seed set, the organization can use artificial intelligence to go look for “more like this.”

If the reviewer provides adequate training, the system works correctly. If she does not, however, the results can be sub-optimal, even when compared to manual review. For instance, it is rare that a human will similarly mistag 5,000 documents when reviewing one at a time, but the converse can certainly happen with artificial intelligence.

Thus, a best practice is to build a large enough seed set to cover all your bases. A few years ago, the common practice was to select as few as 500 documents to provide this training. With data volumes increasing and greater education available on how artificial intelligence works, more organizations are building significantly larger seed sets (generally 10,000 or more documents). Why? Because computers do not understand context as humans do. For instance, a computer doesn't understand that “Mitt is running for President” means the same thing as “Mitt threw his hat in the ring.” It is therefore important to capture all concepts and semantic patterns up front to maximize the chances that the computer catches as many instances of the key expressions as possible. You'll also want to quality control the results to monitor/limit what the computer may miss.

Best practices for language-based analytics methodologies

Again, language-based methodologies can deliver time and cost savings similar to the artificial intelligence approach, with the added benefits of transparency, auditability and reusability, but only if you use them correctly.

Because you are not leveraging artificial intelligence to help you identify what you are looking for, turn the process around and first identify what clearly isn't relevant. You can accomplish tjis by analyzing the language within the collection, and setting up a workflow that allows you to put documents into three buckets: documents that are clearly relevant, documents that clearly are not relevant and those that require additional analysis. Often, you'll find that at least 40 percent of the collection is clearly not relevant based on your language analysis, and therefore you can quickly and defensibly set aside these documents for significant time efficiencies.

Once you have set aside those documents that clearly aren't relevant, then go through the remainder (i.e., the “might be relevant” pile) and have your reviewer highlight the language within the document that she feels makes the document relevant. This will provide you with added insight into how well the review team understands the issue at hand, and will allow you to make real-time adjustments as necessary.

Best practices for either alternative

Regardless of the technology-assisted review approach you use, keep in mind that these are powerful methodologies, but they can also deliver powerfully bad results (yielding risk and hidden costs) if not used properly. An advanced fighter jet can be effective when flown by a trained pilot, but in the hands of someone with only marginal training, it can crash right into the broad side of a mountain.

If you do not have the in-house resources (technology, infrastructure and people) or training processes in place to ensure optimal results, look for an outside resource to provide assistance and expert counsel.  Find someone with deep domain, technology and process expertise—ideally someone who supports both artificial intelligence-based and language-based methodologies so he can compare and contrast each and provide guidance to you on when to use which alternative.

With these best practices, and supporting case law now coming to fruition, technology-assisted review can significantly accelerate the review process, and deliver cost and time savings with as much accuracy and defensibility as the traditional manual review process.