As more algorithm-coded technology comes to market, the debate over how individuals' de-identified data is being used continues to grow. 

A class action lawsuit filed in a Chicago federal court last month highlights the use of sensitive de-identified data for commercial means. Plaintiffs represented by law firm Edelson allege the University of Chicago Medical Center gave Google the electronic health records (EHR) of nearly all of its patients from 2009 to 2016, with which Google would create products. The EHR, which is a digital version of a patient's paper chart, includes a patient's height, weight, vital signs and medical procedure and illness history.

While the hospital asserted it did de-identify data, Edelson claims the hospital included date and time stamps and “copious” free-text medical notes that, combined with Google's other massive troves of data, could easily identify patients, in noncompliance with the Health Insurance Portability and Accountability Act (HIPAA). 

“I think the biggest concern is the quantity of information Google has about individuals and its ability to reidentify information, and this gray area of if HIPAA permits it if it was fully de-identified,” said Fox Rothschild partner Elizabeth Litten.

Litten noted that transferring such data to Google, which has a host of information collected from other services, makes labeling data “de-identified” risky in that instance. “I would want to be very careful with who I share my de-identified data with, [or] share information with someone that doesn't have access to a lot of information. Or [ensure] in the near future the data isn't accessed by a bigger company and made identifiable in the future,” she explained.

If the data can be reidentified, it may also fall under the scope of the European Union's General Data Protection Regulation (GDPR) or California's upcoming data privacy law, noted Cogent Law Group associate Miles Vaughn.

“This is a big issue with the GDPR and soon the CCPA [California Consumer Privacy Act] over if something is truly anonymized.” Vaughn said. If someone really wanted to, they could “cross-list it with existing information and you could find very strong hits,” he cautioned.

Without a federal U.S. law governing all data ownership, Cogent Law Group partner Thomas Gross noted most people allow a company to use their data through agreeing to online terms of service. But an NBC News article highlighted how data can end up in use cases that the original owner never imagined.

Last March, NBC News discovered IBM uses photos from Flickr account holders who agreed to “Creative Commons” licenses, which allows reuse of a photo without paying a license fee. The pictures were used as a “training dataset” to improve IBM's facial recognition software.

According to the article, IBM allows photographers to remove their pictures from its database.  But the company didn't say it allows photographers to remove the coding based on their photos from IBM datasets or software. The conundrum presents an interesting question of whether it's ever too late to revoke de-identified data when it's already programmed in software. 

“The idea of machine learning and so many people contributing and one person saying, 'Pull out my information' and having to pull it out is challenging,” Vaughn said.