One of the most expensive technical portions of e-discovery is something called “processing.” Nontechnical personnel involved in e-discovery, including many lawyers, have only the vaguest notion of what is done in the processing phase. They know that data that is collected goes in one end and comes out the other ready for review, but what happens in between is a mystery. However, there are some potentially important choices involved in the processing phase that could significantly impact what electronically stored information (ESI) is available for review and production, as well as impact associated costs.

The general objectives of processing include identifying exactly what elements or items of ESI have been submitted for processing, including their associated metadata. This allows intelligent and informed decisions to be made that can reduce the volume of data selected for continuation along the path to review. At the same time, the application of processing technology and analysis to the data needs to be performed under strict standards of quality control and to bear in mind chain of custody requirements.

When data is submitted for processing, it is likely to consist of a variety of types or formats, such as word processing documents, backup files, email files, etc. It is also common for data, including email, to be stored in “container” files, such as .zip files or .pst files, which require extraction of the individual files and emails from their containers. Backup data may need to be restored. Moreover, some data, like data in obsolete formats, may need to be converted before further processing can occur. Each file must be captured along with associated metadata and all of this information must be catalogued.

Opportunities then arise to reduce the volume of data, making review less expensive and in some cases reducing the risk of inconsistent review decisions about the same documents. For example, the data set can be “de-duplicated” in various ways, and “near de-duplication” or identification of similarity or common “concepts” among documents can be achieved. Full-text indexing facilitates the ability to search the data, and search terms can be applied to help separate out clearly irrelevant data.

Typically, certain files will cause problems for the initial application of processing technology; potential examples would include password-protected files or corrupt files. These are called “exceptions,” and decisions need to be made as to how such exceptions are to be handled. For example, to what extent will efforts to crack passwords be pursued? Where exceptions are not resolved or identified, some potentially relevant ESI may never see the light of day.

Some or all of the data may need to be transformed into other formats for purposes of review, depending on the characteristics of the review software that will be used. At this point, quality assurance procedures should be implemented. These might include, for example, looking at samples of the processed data and comparing this output with expectations based on information available before processing.

Another important element of processing from beginning to end is reporting. For example, each element of data should be tracked through each step in the process, and all decision making with respect to selectively reducing the volume of data should be documented. Information as to the impact of those decisions on the universe of data should be readily available to help inform decision making.

In cases with substantial volumes of electronic information, data that is collected must be processed in software designed for discovery purposes before it can be ready for review by attorneys. Processing is relatively expensive as far as the technical elements of e-discovery are concerned, but many lawyers and clients do not understand what it is or how it impacts discovery. There is no need for attorneys to become experts in the minutiae of data-processing technology, but a grasp of its major components can help in understanding the life cycle of data in e-discovery.