Technology: 5 steps for analyzing incoming productions
Leveraging Poisson mathematics will dramatically reduce the number of documents that need to be reviewed to understand what a collection actually says about the issues.
February 21, 2014 at 03:00 AM
5 minute read
The original version of this story was published on Law.com
It has become increasingly common for both in-house and defense counsel to find themselves confronted with the task of analyzing a large incoming document production. Incoming collections always present special challenges. The team may be less familiar with the language of the other side's documents. Or perhaps many of the original players in the events may no longer be readily available for questioning. In spite of these difficulties, counsel must answer four fundamental questions:
- Did we receive the documents we asked for?
- Based on what we received, did we ask for the right things?
- What do these documents actually say about the issues?
- Which documents will become exhibits?
Examining large incoming collections can be very time-consuming and expensive, and most attorneys tend to have a low tolerance for these costs. This task is also not well suited to the various AI solutions that focus on determining what subjects a document is about, but not what the document actually says about the subject. In this article, I want to demonstrate an amazingly effective workflow for examining incoming collections. The workflow is based on the mathematical consequences of the Poisson distribution as applied to the observance of rare events.
The good news is that attorneys don't need to understand the “why” of the mathematics to take advantage of the “how.” Leveraging Poisson mathematics will dramatically reduce the number of documents that need to be reviewed to understand what a collection actually says about the issues. The method is easy to deploy, logical to understand and very, very useful. Here's how it works.
Step 1 – Create an organization structure
Define an organizational schema for the incoming collection or, put more simply, define a bunch of categories that will be used to organize the incoming documents. Most likely your team has already submitted a document request to the other side, and the categories defined in that request can form the basis of your schema. It's always good to review the request to make sure that there is as little overlap as possible between the categories.
Step 2 – Populate the categories
Identify the keywords and phrases that will be associated with each category. You will use these words and phrases to place the documents into the appropriate categories. I usually extract the vocabulary of the collection and arrange the root words by parts of speech to help with this task.
Step 3 – Examine the categories
The first question is: Did we get what we expected? The second question is: What else did we get? To answer these questions, sample the documents that did not fit into categories. Can you prove with a 95 percent confidence level that all of these documents are not relevant? If you can, then you are ready to move on to Step 4.
If you can't, then one of two things must be true: the uncategorized documents contain some unexpected keywords that you must now add, or the collection contains documents about topics that you hadn't thought of as relevant and must now reconsider. Continue your cleanup until you can prove with 95 percent certainty that none of the uncategorized documents are relevant to your case.
Step 4 – Identify a document strategy for each category
Part of the reason for reviewing the incoming collection is to identify documents that will be used as exhibits. For each category, there are three strategic possibilities:
- We are seeking enough good documents to make our point.
- We need to find every possible example document.
- We hope to find a smoking gun.
It is important to define a strategy for each category so that you know when you have accomplished your goal and can move on to the next category.
Step 5 – Examine the documents
This is where the Poisson mathematics comes in. To explain the process, let's consider an example. Assume that the organizational schema consists of 50 categories and that each category has been populated with 2,000 documents.
Query: Do you need to read all 100,000 documents to understand what the collection says about each of the 50 issues? Poisson says “no.” You need only read 15,000.
The gist of the mathematics is as follows: To be 95 percent certain you have seen all of the relevant language that appears in more than 1 percent of the documents in the category (a “rare event”), you need only read 300 documents in that category. In other words, by reading 300 randomly selected documents from each category, you are 95 percent certain to see the relevant language that appears in all but 50 (1 percent) of the 2,000 documents in each category.
That is certainly enough language to both understand what the documents say about the issues and to find your exhibits – unless you are looking for a smoking gun. In which case, you may have to do a bit more work to reach the certainty level.
In summary, examining large incoming productions can be time-consuming and expensive. Moreover, it is not a task that is well suited to the various AI solutions. Fortunately, there is a simple method available for reducing the number of documents that must be reviewed in order to understand very accurately what the collection says about each topic. The method relies on information already known in the case and is easily executed. I encourage you to try it.
This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.
To view this content, please continue to their sites.
Not a Lexis Subscriber?
Subscribe Now
Not a Bloomberg Law Subscriber?
Subscribe Now
NOT FOR REPRINT
© 2025 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.
You Might Like
View AllInternal Whistleblowing Surged Globally in 2024, So Why Were US Numbers Flat?
6 minute readLawyers' Phones Are Ringing: What Should Employers Do If ICE Raids Their Business?
6 minute readTrending Stories
- 1We the People?
- 2New York-Based Skadden Team Joins White & Case Group in Mexico City for Citigroup Demerger
- 3No Two Wildfires Alike: Lawyers Take Different Legal Strategies in California
- 4Poop-Themed Dog Toy OK as Parody, but Still Tarnished Jack Daniel’s Brand, Court Says
- 5Meet the New President of NY's Association of Trial Court Jurists
Who Got The Work
J. Brugh Lower of Gibbons has entered an appearance for industrial equipment supplier Devco Corporation in a pending trademark infringement lawsuit. The suit, accusing the defendant of selling knock-off Graco products, was filed Dec. 18 in New Jersey District Court by Rivkin Radler on behalf of Graco Inc. and Graco Minnesota. The case, assigned to U.S. District Judge Zahid N. Quraishi, is 3:24-cv-11294, Graco Inc. et al v. Devco Corporation.
Who Got The Work
Rebecca Maller-Stein and Kent A. Yalowitz of Arnold & Porter Kaye Scholer have entered their appearances for Hanaco Venture Capital and its executives, Lior Prosor and David Frankel, in a pending securities lawsuit. The action, filed on Dec. 24 in New York Southern District Court by Zell, Aron & Co. on behalf of Goldeneye Advisors, accuses the defendants of negligently and fraudulently managing the plaintiff's $1 million investment. The case, assigned to U.S. District Judge Vernon S. Broderick, is 1:24-cv-09918, Goldeneye Advisors, LLC v. Hanaco Venture Capital, Ltd. et al.
Who Got The Work
Attorneys from A&O Shearman has stepped in as defense counsel for Toronto-Dominion Bank and other defendants in a pending securities class action. The suit, filed Dec. 11 in New York Southern District Court by Bleichmar Fonti & Auld, accuses the defendants of concealing the bank's 'pervasive' deficiencies in regards to its compliance with the Bank Secrecy Act and the quality of its anti-money laundering controls. The case, assigned to U.S. District Judge Arun Subramanian, is 1:24-cv-09445, Gonzalez v. The Toronto-Dominion Bank et al.
Who Got The Work
Crown Castle International, a Pennsylvania company providing shared communications infrastructure, has turned to Luke D. Wolf of Gordon Rees Scully Mansukhani to fend off a pending breach-of-contract lawsuit. The court action, filed Nov. 25 in Michigan Eastern District Court by Hooper Hathaway PC on behalf of The Town Residences LLC, accuses Crown Castle of failing to transfer approximately $30,000 in utility payments from T-Mobile in breach of a roof-top lease and assignment agreement. The case, assigned to U.S. District Judge Susan K. Declercq, is 2:24-cv-13131, The Town Residences LLC v. T-Mobile US, Inc. et al.
Who Got The Work
Wilfred P. Coronato and Daniel M. Schwartz of McCarter & English have stepped in as defense counsel to Electrolux Home Products Inc. in a pending product liability lawsuit. The court action, filed Nov. 26 in New York Eastern District Court by Poulos Lopiccolo PC and Nagel Rice LLP on behalf of David Stern, alleges that the defendant's refrigerators’ drawers and shelving repeatedly break and fall apart within months after purchase. The case, assigned to U.S. District Judge Joan M. Azrack, is 2:24-cv-08204, Stern v. Electrolux Home Products, Inc.
Featured Firms
Law Offices of Gary Martin Hays & Associates, P.C.
(470) 294-1674
Law Offices of Mark E. Salomone
(857) 444-6468
Smith & Hassler
(713) 739-1250