Hillary Clinton campaigning in Wilton Manors, FL. October 30, 2016.(Photo: Gregory Reed/Shutterstock.com) Hillary Clinton campaigning in Wilton Manors, FL. October 30, 2016.(Photo: Gregory Reed/Shutterstock.com)

In raw numbers, it can seem a daunting task: 650,000 emails, eight days, and arguably, one presidential election hanging in the balance.

But the FBI’s review of emails related to the former secretary of state and current Democratic presidential nominee Hillary Clinton’s handling of classified information on her private email server—emails only recently found during the bureau’s unrelated investigation of a device belonging to former Congressman Anthony Weiner—concluded just over a week after it was announced.

The aftermath of the FBI’s announcement was as many expected only two days out from the election. While the Clinton campaign spoke of vindication, Republican presidential nominee Donald Trump’s campaign spoke of foul play. At a rally in Michigan following the FBI’s announcement, Trump reiterated his claims the election was being rigged: “You can’t review 650,000 new emails in eight days,” he told the crowd. “You can’t do it, folks.”

Trump’s incredulity was passionately echoed by his campaign surrogates online. Retired lieutenant general Michael Flynn, for example, tweeted: “IMPOSSIBLE: There R 691,200 seconds in 8 days. DIR Comey has thoroughly reviewed 650,000 emails in 8 days? An email / second?”

 

Modern document review, however, is not about reviewing all documents in their entirety. The reviewers, after all, did not need to search through all of 650,000 emails, but only those sent or received by Clinton through her personal server, and if necessary, those belonging to her aide Huma Abedin and any other source deemed germane to the investigation.

But even those targeted emails still may not have all been considered for review. According to NBC News, nearly all the emails from Weiner’s device were duplicates of emails the FBI already had.

For Chris Chapman, regional vice president of managed review services at Xact Data Discovery, this situation was far from unusual in such a voluminous and ongoing review. “We commonly see email data sets added to ongoing matters that are 90 to 95 percent or more duplicative of content that was already reviewed. In this case, I’d be surprised if more than 50,000-75,000 emails required review, and I suspect the number here was actually far less than that.”

And getting to that core set of unique, unseen emails is a relatively fast endeavor. John Rosenthal, partner and chair of the e-discovery and information governance practice at Winston & Strawn, explained that to find and discard duplicates, large collections of emails can be run through a process called “hashing” that “generates a unique alphanumeric value for every document—think of it like a fingerprint.” The process is similar to that used in blockchain technology.

Since each original email’s hash is tied to unique content and metadata, using the right tools, the hash values from the 650,000 emails can be automatically compared to the hash values of emails the FBI had already obtained from former Secretary of State Clinton in its investigation.

This comparison, however, is far from an all-or-nothing process. Rosenthal explained in addition to exact replicas, hashing tools can also “find all near duplicate [emails] on a sliding scale,” by comparing email hashes to see if they are “going to be pretty close” to one another.

Such “near duplicates” can indicate that original emails had been altered at one point in time, for example if there was “an earlier draft of the email or maybe it had a different attachment on it, or maybe it was a reply,” Rosenthal said.

And while “near duplicates would take a little longer to double check,” Ralph Losey, principal and national e-discovery counsel at Jackson Lewis, noted, using widely available tools, identifying such near duplicates takes no additional processing time.

Deconstructing Emails

The FBI did not reveal the exact number of emails that were duplicates or near duplicates in its recent review, but even if it was a large tranche of the original 650,000, the question remains: How does one review even tens of thousands of unique, original emails in such a short time? The answer is easy: Break them apart into their essential elements.

One way modern e-discovery software accomplishes this feat is by building an index of its dataset, in this case emails, “where [all] the metadata is available in a field that can be then queried,” explained Mary Mack, executive director for the Association of Certified E-Discovery Specialists (ACEDS).

“So for example, you’d be able to look for [all emails containing] Clinton’s particular email address,” or filter through any emails sent from anyone using her private server’s ClintonEmail.com domain name, or even search for variations of the canonical name linked to an email address, such as “H.Clinton” or “Hillary.C,” Mack said.

Being able to create an index also offers modern e-discovery the ability to organize emails by senders, receivers, and in many cases, create “communication mapping, where you can have a visual map that would show whose communicating with you and the frequency of the communications,” Rosenthal added.

In addition, reviewers can search the content of every single email for keywords or phrases that are related to a broad topic. To explain this, Rosenthal uses the example of “Thanksgiving,” noting that in addition to the word “Thanksgiving,” e-discovery software “would find and see all emails “that reference ‘turkey,’ and all the ones that reference ‘cranberry,’ and all the ones that reference ‘green beans’— it would create buckets of those we could look at.”

And though significant hardware capability would be needed for such an endeavor of hundreds of thousands emails, Mack noted it is far within the abilities of e-discovery practitioners.

“As a matter of fact, when [FBI Director James Comey] first thought [the review] would not be done before the election, most e-discovery professionals were like, ‘What? Give it to me, and I’ll get it done in the next 48 hours,’ because you can throw hardware at it in order to scale, if you need to, with most systems these days.”

“I have done these exact types of what I call investigative reviews,” Shannon Capone Kirk, e-discovery counsel at Ropes & Gray added, noting that even assuming the volume of the review is 650,000 emails contained no duplicates, she would still review the task as “a small to medium investigative review. And I would have great confidence that I could complete that review not only in eight days, but under eight days.”

Accounting for Errors

Any process, no matter how meticulously performed, is prone to some error and mistake, and modern day e-discovery is no exception. But for Losey, the small amount of uncertainty present in any automated e-discovery endeavor does not mean manual human review of documents is a more accurate and dependable process.

“People still think linear review of each and every document is the gold standard. In fact, once you go beyond 10,000 or so documents, it is very inaccurate. When you get into hundreds of thousands, it is terrible.”

Perfection is also not the goal of e-discovery in the U.S. court system, which aims for the far more tangible standard of reasonableness. “In the average civil case today, people can’t achieve perfection, and something may get missed because there is a cost-benefit analysis,” Rosenthal said. “If you look at every single document, it’s too cost prohibitive and not reasonable.”

He added, though, that government criminal or civil investigations are different as there is “a lot at stake,” and therefore it is likely that reviewers “exercise more due diligence to make sure that nothing is missed here, because of the consequences of a wrong decision, or the consequences of a right decision.”

While Rosenthal believed that while something could always be missed in a review, due to the FBI’s thoroughness and the potentially small set of emails they needed to review, “the likelihood of something here being missed is relatively small.”


Originally published on Legal Technology News. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed.

This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.

To view this content, please continue to their sites.

Not a Lexis Subscriber?
Subscribe Now

Not a Bloomberg Law Subscriber?
Subscribe Now

Why am I seeing this?

LexisNexis® and Bloomberg Law are third party online distributors of the broad collection of current and archived versions of ALM's legal news publications. LexisNexis® and Bloomberg Law customers are able to access and use ALM's content, including content from the National Law Journal, The American Lawyer, Legaltech News, The New York Law Journal, and Corporate Counsel, as well as other sources of legal information.

For questions call 1-877-256-2472 or contact us at [email protected]