Technology: 5 steps for analyzing incoming productions
Leveraging Poisson mathematics will dramatically reduce the number of documents that need to be reviewed to understand what a collection actually says about the issues.
February 21, 2014 at 03:00 AM
5 minute read
The original version of this story was published on Law.com
It has become increasingly common for both in-house and defense counsel to find themselves confronted with the task of analyzing a large incoming document production. Incoming collections always present special challenges. The team may be less familiar with the language of the other side's documents. Or perhaps many of the original players in the events may no longer be readily available for questioning. In spite of these difficulties, counsel must answer four fundamental questions:
- Did we receive the documents we asked for?
- Based on what we received, did we ask for the right things?
- What do these documents actually say about the issues?
- Which documents will become exhibits?
Examining large incoming collections can be very time-consuming and expensive, and most attorneys tend to have a low tolerance for these costs. This task is also not well suited to the various AI solutions that focus on determining what subjects a document is about, but not what the document actually says about the subject. In this article, I want to demonstrate an amazingly effective workflow for examining incoming collections. The workflow is based on the mathematical consequences of the Poisson distribution as applied to the observance of rare events.
The good news is that attorneys don't need to understand the “why” of the mathematics to take advantage of the “how.” Leveraging Poisson mathematics will dramatically reduce the number of documents that need to be reviewed to understand what a collection actually says about the issues. The method is easy to deploy, logical to understand and very, very useful. Here's how it works.
Step 1 – Create an organization structure
Define an organizational schema for the incoming collection or, put more simply, define a bunch of categories that will be used to organize the incoming documents. Most likely your team has already submitted a document request to the other side, and the categories defined in that request can form the basis of your schema. It's always good to review the request to make sure that there is as little overlap as possible between the categories.
Step 2 – Populate the categories
Identify the keywords and phrases that will be associated with each category. You will use these words and phrases to place the documents into the appropriate categories. I usually extract the vocabulary of the collection and arrange the root words by parts of speech to help with this task.
Step 3 – Examine the categories
The first question is: Did we get what we expected? The second question is: What else did we get? To answer these questions, sample the documents that did not fit into categories. Can you prove with a 95 percent confidence level that all of these documents are not relevant? If you can, then you are ready to move on to Step 4.
If you can't, then one of two things must be true: the uncategorized documents contain some unexpected keywords that you must now add, or the collection contains documents about topics that you hadn't thought of as relevant and must now reconsider. Continue your cleanup until you can prove with 95 percent certainty that none of the uncategorized documents are relevant to your case.
Step 4 – Identify a document strategy for each category
Part of the reason for reviewing the incoming collection is to identify documents that will be used as exhibits. For each category, there are three strategic possibilities:
- We are seeking enough good documents to make our point.
- We need to find every possible example document.
- We hope to find a smoking gun.
It is important to define a strategy for each category so that you know when you have accomplished your goal and can move on to the next category.
Step 5 – Examine the documents
This is where the Poisson mathematics comes in. To explain the process, let's consider an example. Assume that the organizational schema consists of 50 categories and that each category has been populated with 2,000 documents.
Query: Do you need to read all 100,000 documents to understand what the collection says about each of the 50 issues? Poisson says “no.” You need only read 15,000.
The gist of the mathematics is as follows: To be 95 percent certain you have seen all of the relevant language that appears in more than 1 percent of the documents in the category (a “rare event”), you need only read 300 documents in that category. In other words, by reading 300 randomly selected documents from each category, you are 95 percent certain to see the relevant language that appears in all but 50 (1 percent) of the 2,000 documents in each category.
That is certainly enough language to both understand what the documents say about the issues and to find your exhibits – unless you are looking for a smoking gun. In which case, you may have to do a bit more work to reach the certainty level.
In summary, examining large incoming productions can be time-consuming and expensive. Moreover, it is not a task that is well suited to the various AI solutions. Fortunately, there is a simple method available for reducing the number of documents that must be reviewed in order to understand very accurately what the collection says about each topic. The method relies on information already known in the case and is easily executed. I encourage you to try it.
This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.
To view this content, please continue to their sites.
Not a Lexis Subscriber?
Subscribe Now
Not a Bloomberg Law Subscriber?
Subscribe Now
NOT FOR REPRINT
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.
You Might Like
View AllInside Track: How 2 Big Financial Stories—an Antitrust Case and a Megamerger—Became Intertwined
CLOs Still Jazzed About Gen Al, Even as They Realize Successfully Implementing It Is Harder Than It Looks
2 minute readAT&T General Counsel Joins ADM Board as Company Reels From Accounting Scandal
How Gen AI Is Changing Legal Work for In-House Counsel
Trending Stories
- 1Trump's Return to the White House: The Legal Industry Reacts
- 2Infant Formula Judge Sanctions Kirkland's Jim Hurst: 'Overtly Crossed the Lines'
- 3Climate Disputes, International Arbitration, and State Court Limitations for Global Issues
- 4Election 2024: Nationwide Judicial Races and Ballot Measures to Watch
- 5Judicial Face-Off: Navigating the Ethical and Efficient Use of AI in Legal Practice [CLE Pending]
Who Got The Work
Michael G. Bongiorno, Andrew Scott Dulberg and Elizabeth E. Driscoll from Wilmer Cutler Pickering Hale and Dorr have stepped in to represent Symbotic Inc., an A.I.-enabled technology platform that focuses on increasing supply chain efficiency, and other defendants in a pending shareholder derivative lawsuit. The case, filed Oct. 2 in Massachusetts District Court by the Brown Law Firm on behalf of Stephen Austen, accuses certain officers and directors of misleading investors in regard to Symbotic's potential for margin growth by failing to disclose that the company was not equipped to timely deploy its systems or manage expenses through project delays. The case, assigned to U.S. District Judge Nathaniel M. Gorton, is 1:24-cv-12522, Austen v. Cohen et al.
Who Got The Work
Edmund Polubinski and Marie Killmond of Davis Polk & Wardwell have entered appearances for data platform software development company MongoDB and other defendants in a pending shareholder derivative lawsuit. The action, filed Oct. 7 in New York Southern District Court by the Brown Law Firm, accuses the company's directors and/or officers of falsely expressing confidence in the company’s restructuring of its sales incentive plan and downplaying the severity of decreases in its upfront commitments. The case is 1:24-cv-07594, Roy v. Ittycheria et al.
Who Got The Work
Amy O. Bruchs and Kurt F. Ellison of Michael Best & Friedrich have entered appearances for Epic Systems Corp. in a pending employment discrimination lawsuit. The suit was filed Sept. 7 in Wisconsin Western District Court by Levine Eisberner LLC and Siri & Glimstad on behalf of a project manager who claims that he was wrongfully terminated after applying for a religious exemption to the defendant's COVID-19 vaccine mandate. The case, assigned to U.S. Magistrate Judge Anita Marie Boor, is 3:24-cv-00630, Secker, Nathan v. Epic Systems Corporation.
Who Got The Work
David X. Sullivan, Thomas J. Finn and Gregory A. Hall from McCarter & English have entered appearances for Sunrun Installation Services in a pending civil rights lawsuit. The complaint was filed Sept. 4 in Connecticut District Court by attorney Robert M. Berke on behalf of former employee George Edward Steins, who was arrested and charged with employing an unregistered home improvement salesperson. The complaint alleges that had Sunrun informed the Connecticut Department of Consumer Protection that the plaintiff's employment had ended in 2017 and that he no longer held Sunrun's home improvement contractor license, he would not have been hit with charges, which were dismissed in May 2024. The case, assigned to U.S. District Judge Jeffrey A. Meyer, is 3:24-cv-01423, Steins v. Sunrun, Inc. et al.
Who Got The Work
Greenberg Traurig shareholder Joshua L. Raskin has entered an appearance for boohoo.com UK Ltd. in a pending patent infringement lawsuit. The suit, filed Sept. 3 in Texas Eastern District Court by Rozier Hardt McDonough on behalf of Alto Dynamics, asserts five patents related to an online shopping platform. The case, assigned to U.S. District Judge Rodney Gilstrap, is 2:24-cv-00719, Alto Dynamics, LLC v. boohoo.com UK Limited.
Featured Firms
Law Offices of Gary Martin Hays & Associates, P.C.
(470) 294-1674
Law Offices of Mark E. Salomone
(857) 444-6468
Smith & Hassler
(713) 739-1250