“I thought I just reviewed that document!” is something every document reviewer since the dawn of time has thought. Duplicate documents, and documents that seem like duplicates but aren’t, can slow down document review, increase costs and lead to inconsistent document coding—which can be extremely problematic when occurring during a privilege review. This article explores why document reviewers who claim to see duplicates are right and wrong at the same time by explaining how deduplication works and the inherent shortcomings with that process. This article also offers a solution: two technologies—near-dupe detection and email threading—which can greatly reduce the number of “duplicate” documents that must be reviewed.

Deduplication sounds simple in theory: Remove all of the duplicate documents when you load documents into your document review platform. Deduplication isn’t simple in practice: Is a PDF and Word document that have the exact same text a duplicate ? (No) What about a document that is attached to an email and saved on someone’s hard drive? (Yes) The slightest of differences in documents, which might not be perceptible to the document reviewer, or documents saved in different formats, can mean the documents are not exact duplicates. If they aren’t exact duplicates, you’ll be stuck looking at what is practically the same document multiple times, because the documents won’t generate the same hash value (what was used to answer the questions posed above). Hash values are the standard method document review platforms use to deduplicate documents. When the documents are processed into the platform, a hash value for each document will be generated and compared with all of the documents already processed, and any documents with matching hash values will not be made available for review. (How hash values are generated is beyond the scope of this article.)

This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.

To view this content, please continue to their sites.

Not a Lexis Subscriber?
Subscribe Now

Not a Bloomberg Law Subscriber?
Subscribe Now

Why am I seeing this?

LexisNexis® and Bloomberg Law are third party online distributors of the broad collection of current and archived versions of ALM's legal news publications. LexisNexis® and Bloomberg Law customers are able to access and use ALM's content, including content from the National Law Journal, The American Lawyer, Legaltech News, The New York Law Journal, and Corporate Counsel, as well as other sources of legal information.

For questions call 1-877-256-2472 or contact us at [email protected]