A Lesson in Document Review, Spurred by Sen. Harris' Comments at Kavanaugh's Confirmation Hearing
Document review experts discuss the process of reviewing a large number of documents, such as the some 42,000 pages of documents dumped the day before the Brett Kavanaugh confirmation hearing began.
September 07, 2018 at 12:15 PM
6 minute read
At the start of Brett Kavanaugh's Supreme Court confirmation hearing on Tuesday, Sen. Kamala Harris suggested that the hearing should be delayed because of the inability to review some 42,000 pages of documents released the night before. “We can not possibly move forward with this hearing,” Harris, D-California, said.
The documents, released by a lawyer overseeing the Bush White House records, were related to Kavanaugh's time in the George W. Bush administration. Senate Minority Leader Chuck Schumer, D-New York, tweeted at 7:13 p.m. the night of the document dump that “Not a single senator will be able to review these records before tomorrow.”
Less than three hours after that Schumer tweet, the Senate Judiciary Committee tweeted at 9:50 p.m. that the Majority staff had completed its review of “each and every one of these pages.”
So who's right? Were those documents impossible to review? Was the Majority able to effectively search all of those documents in half the time it would take to binge watch “Mindhunter” on Netflix?
According to document review experts who regularly deal with large quantities of documents in litigation, the answer is: It depends. It depends on the objective, type of documents, and whether it's known what—exactly—the reviewers want to find within them.
John Rosenthal, partner and chair of the e-discovery and information governance practice at Winston & Strawn, says that a skilled document reviewer can look at about 60 documents per hour, with each document averaging two to four pages, for a total of 120-240 pages per hour per person.
“I don't know what the Senate staff looks like and I don't know who is looking at those documents. Even without a tool set this is not an insurmountable amount of documents,” he said.
If that still seems a little sluggish, Rosenthal said the process can move even faster when search tools are introduced.
“[The Senate Judiciary staff] are trying to find documents of interest to them so they can ask questions about them to get the witness' perspective on them,” he said. “I would be shocked if they are going to review [everything]. You wouldn't necessarily review all those documents, what you're going to do is figure out what issues are more important to you then prioritize the documents according to those issues and probably look at the stuff that is more relevant to what you are looking for.”
But Gareth Evans, a Redgrave partner who deals with large quantities of documents in his practice—pages numbering the tens of thousands—cautions against the limitations of search terms.
“It seems to assume you're looking for documents on a particular subject and it may be presumptuous to limit yourself to a particular topic by using keywords because the keyword search will obviously only identify documents with those terms in them,” he said. “If you're using keywords that miss entire topics then the search is not an effective one.”
Turning back to the Kavanaugh hearing, is it possible that all those documents were reviewed in those short few hours? According to the Washington Post, there were 5,148 documents totaling 42,390 pages related to Kavanaugh's service in the Bush administration. Rosenthal said that the 60 documents per hour standard involves documents that average two to four pages. The Kavanaugh documents average eight pages per document. Using Rosenthal's calculations at a rate of 240 pages an hour per staff member, and generously spotting a well-orchestrated review strategy at the outset, that would clock in at 11.75 hours for a team of 15 to sift through 42,390 pages. 11.75 hours is nearly double the time the Majority in the Senate Judiciary claims to have completed the task, but still well within the span of the four-day confirmation hearing. So while it's not ideal, and questions about thoroughness can be raised, it's not necessarily the impossible task that Harris and Schumer made it out to be.
For a better understanding of how reviewers undertake a project of this magnitude, here's Rosenthal's breakdown of the approach used to cull through large quantities of documents:
|The Technical Breakdown
Hashing
“[To review large data sets] the first thing you do, in litigation context, is you would put this through a processing engine which is going to generate a fingerprint for each document. … And it basically is going to calculate what would be a unique document number for each document.
If you run the data set through the hashing algorithm it will tell you immediately if two documents have the same has number, then that means they are 100 percent identical. There are also technologies that can show near duplication, so it may be a draft of the same document,” Rosenthal said.
Email Threading
“Then on the email side we have email threading tools [that show email chains]. With email threading technology, you press a button and it brings all the thread together in one place that are related and it will tell you out of these all of emails these two are the most inclusive. They are going to have the entire conversations, so you only need to look at these two instead of the entire thread set,” Rosenthal said.
Keyword Search / Clustering
“The next one you would use is really looking at search terms and running search terms to try to identify through keywords those documents that are important. The next set of tools you would use is a clustering engine. You take your data set, press a button and it's going to put everything into related categories. So for example, let's use 'Thanksgiving,' the clustering engine is then going to give me a bucket of things related to Thanksgiving, like turkey, cranberries, green beans, football, etc,” Rosenthal said.
Predictive Coding
“Then the next tool in the arsenal is predicting coding engine where you are training the engine. You use 2 or 3 percent of the document set and you train the algorithm as to what you are looking for and once the algorithm is trained you run the rest of the 97 percent of the data through a classifier and it will put categorized things into different buckets for you. On a 24-hours basis it would be pretty tough to run a predictive coding exercise,” Rosenthal said.
This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.
To view this content, please continue to their sites.
Not a Lexis Subscriber?
Subscribe Now
Not a Bloomberg Law Subscriber?
Subscribe Now
NOT FOR REPRINT
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.
You Might Like
View AllTopping Big Law, Litigation Firm the Latest to Dole Out Above-Market Bonuses
3 minute readSenate Panel Postpones Vote on Reconfirmation of Democrat Crenshaw to SEC
Trump-Appointed Judge Presides Over NASCAR Antitrust Dispute Under Case Reassignment
3 minute readTrending Stories
- 1Call for Nominations: The Recorder and Law.com's California Legal Awards 2025
- 2The Week in Data Dec. 13: A Look at Legal Industry Trends by the Numbers
- 3Antitrust Class Actions Against CVS, Other Pharmacy Benefit Managers Are Piling Up
- 4Judge Grinds NY's Cannabis Licensing Regime to a Halt Again
- 5On the Move and After Hours: Barclay Damon; VLJ; Barnes & Thornburg
Who Got The Work
Michael G. Bongiorno, Andrew Scott Dulberg and Elizabeth E. Driscoll from Wilmer Cutler Pickering Hale and Dorr have stepped in to represent Symbotic Inc., an A.I.-enabled technology platform that focuses on increasing supply chain efficiency, and other defendants in a pending shareholder derivative lawsuit. The case, filed Oct. 2 in Massachusetts District Court by the Brown Law Firm on behalf of Stephen Austen, accuses certain officers and directors of misleading investors in regard to Symbotic's potential for margin growth by failing to disclose that the company was not equipped to timely deploy its systems or manage expenses through project delays. The case, assigned to U.S. District Judge Nathaniel M. Gorton, is 1:24-cv-12522, Austen v. Cohen et al.
Who Got The Work
Edmund Polubinski and Marie Killmond of Davis Polk & Wardwell have entered appearances for data platform software development company MongoDB and other defendants in a pending shareholder derivative lawsuit. The action, filed Oct. 7 in New York Southern District Court by the Brown Law Firm, accuses the company's directors and/or officers of falsely expressing confidence in the company’s restructuring of its sales incentive plan and downplaying the severity of decreases in its upfront commitments. The case is 1:24-cv-07594, Roy v. Ittycheria et al.
Who Got The Work
Amy O. Bruchs and Kurt F. Ellison of Michael Best & Friedrich have entered appearances for Epic Systems Corp. in a pending employment discrimination lawsuit. The suit was filed Sept. 7 in Wisconsin Western District Court by Levine Eisberner LLC and Siri & Glimstad on behalf of a project manager who claims that he was wrongfully terminated after applying for a religious exemption to the defendant's COVID-19 vaccine mandate. The case, assigned to U.S. Magistrate Judge Anita Marie Boor, is 3:24-cv-00630, Secker, Nathan v. Epic Systems Corporation.
Who Got The Work
David X. Sullivan, Thomas J. Finn and Gregory A. Hall from McCarter & English have entered appearances for Sunrun Installation Services in a pending civil rights lawsuit. The complaint was filed Sept. 4 in Connecticut District Court by attorney Robert M. Berke on behalf of former employee George Edward Steins, who was arrested and charged with employing an unregistered home improvement salesperson. The complaint alleges that had Sunrun informed the Connecticut Department of Consumer Protection that the plaintiff's employment had ended in 2017 and that he no longer held Sunrun's home improvement contractor license, he would not have been hit with charges, which were dismissed in May 2024. The case, assigned to U.S. District Judge Jeffrey A. Meyer, is 3:24-cv-01423, Steins v. Sunrun, Inc. et al.
Who Got The Work
Greenberg Traurig shareholder Joshua L. Raskin has entered an appearance for boohoo.com UK Ltd. in a pending patent infringement lawsuit. The suit, filed Sept. 3 in Texas Eastern District Court by Rozier Hardt McDonough on behalf of Alto Dynamics, asserts five patents related to an online shopping platform. The case, assigned to U.S. District Judge Rodney Gilstrap, is 2:24-cv-00719, Alto Dynamics, LLC v. boohoo.com UK Limited.
Featured Firms
Law Offices of Gary Martin Hays & Associates, P.C.
(470) 294-1674
Law Offices of Mark E. Salomone
(857) 444-6468
Smith & Hassler
(713) 739-1250