As companies realize the benefits of big data on their research & development, marketing, sales, branding, and revenue growth, they will increasingly have to reckon with its risks. Utilizing and monetizing big data raises enormous legal questions and potential liabilities. The most salient of these legal issues, at least in the near term, revolve around privacy, regulatory compliance, and duty to intervene.

When companies analyze extremely large pools of data, they often attempt to protect the privacy of individuals through “anonymization,” the process of removing or replacing individual identifying information from a communication or record. Communications and records can be made complete anonymous by removing all identifiers or made pseudonymous by assigning each individual replacement identifiers, like a 10-digit code.

Of course, stories of incomplete or ineffective anonymization are rife. In one of the most infamous incidents, the Massachusetts Group Insurance Commission released “anonymized” data on state employees' hospital visits in the mid-1990's as part of a study. In order to prove the existing limitations of anonymization, then-graduate student, Latanya Sweeny, publicly identified Governor William Weld without difficulty. Continuing her work on this topic, Sweeney showed in 2000 that 87 percent of all Americans could be identified using only three data points: birthdate, gender, and zip code.