To attorneys, the mention of structured data could bring forth feelings of indifference, confusion, anxiety, frustration and other mixed emotions. For some, the task of simply understanding the nuances between structured and unstructured data can be a daunting endeavor. That said, there is a wealth of information about many data sets living as structured data (i.e., data residing in databases—think tables and fields, not documents) within a variety of systems. When tackled strategically and with the right approach, leveraging structured data can increase e-discovery efficiency and provide valuable information about document sets that might otherwise be invisible or lost.

As an example, consider a product liability suit that involves collection, analysis and review from massive data repositories, including product quality assessments, compliance reports, regulatory reporting systems, and other documents that reside in a variety of databases or systems within an organization. In these instances, counsel is faced with a data set of hundreds of thousands of documents—hundreds of gigabytes of data—that need to be reviewed. Traditionally, counsel approaches these data sets by using keyword searching to locate and collect the potentially relevant data for review. However, by leveraging structured data from those systems as the lens through which to look at the documents and data, attorneys can identify if the relevant product issue or defect has a unique identifier (e.g., a certain code, ID number or other classifier that may even be constant across the enterprise). If so, rather than using keywords that might be under and/or over-inclusive, the team can query the structured data identifier to find all of the records and underlying documents associated with that product or issue.

Counsel can further refine this methodology to find specific date ranges, metadata of relevance, document title or category hits or other clues to unlock potentially relevant information from within the data set. Counsel can use this process to exclude documents from the data set as well, and potentially reduce the overall review set—which results in saved time and money for attorneys and their clients. This type of analysis can be particularly useful in industries that rely heavily on enterprise-wide system integration, including financial services, insurance, health-care and manufacturing.