Technology: The top 3 myths of big data, Yotta, Yotta, Yotta…
Good data hygiene means that organizations should keep whats valuable and purge whats not. This unassailable fact doesnt really change despite the emergence of big data.
December 06, 2013 at 03:00 AM
6 minute read
The original version of this story was published on Law.com
A yottabyte equals 1 trillion terabytes (the largest data metric that most commonly used today). That's not just big data, it's really big data, and it's clearly the direction that things are heading in; particularly given the prevalence of today's “keep everything” mantra. It's this thinking that when combined with the equally pernicious belief that “storage is cheap” has brought legal firms, private and public companies and government agencies alike to the data deluge precipice, where even the possibility of big data can't provide salvation.
In order to create a more sustainable data management future, several big data myths need to be debunked.
Myth #1 – All data is valuable
Under the first big data myth, many tend to believe that the 3 V's of big data (Volume, Velocity, Variety) are all that matter. The thinking goes that even if we can't make sense of data right now, in the future, big data applications will continue to advance so that historical data troves can be mined for useful nuggets.
One use case is at least conceptually compelling. Assume, for example, that a company has 10 years of unexamined data relating to a particular area of its business. Hypothetically, at some point in the future, it is then able to leverage big data analytics to examine this historical information in an effort to predict future customer trends. The historical data could be used to validate that the prediction engine is accurate, by looking at data from years one-nine to predict what will happen in year ten (which has already occurred). This way, the argument goes, the prediction capabilities could then be validated by the historical data before they're applied to the future.
The fault in this logic is the failure to realize that all data isn't created equally. Data Value (as an additional V) is a critical component in the big data equation. Not only is the value of data subjective, to the analytical task at hand, but often it has a definitive shelf life. For example, customer sentiment data (how a client feels about a given product or solution) may be very short lived. Consider whether a six-month-old satisfaction rating regarding a dinner at an upscale restaurant has any value after the restaurant has profoundly changed its menu. It might to conceivably show how the new cuisine is faring, but then think if this information still has value after another twelve months.
This illustrates how data value decreases over time, but what about information that has no information value to begin with. In a recent survey by the Compliance, Governance and Oversight Council, it was revealed that 69 percent of an average company's data had no legal, regulatory, legal hold or other business value. This mix contains duplicate (and near duplicate) data, employee's personal information and other corporate noise that's only tangentially related to the core business at task.
The bottom line is that retaining more of this valueless data won't yield better results even with the advent of big data initiatives. In fact, the opposite is true.
Myth #2 — With big data, the more information the better
The Rand Corporation in their recent Data Governance Survey revealed that the participants “expected data growth in the next year “between 26 percent and 50 percent, with several participants indicating they expect data growth of more than 200 percent over the next year.” The median amount of data stored by survey respondents was between 20TB to 50TB, with a shocking 22 percent in the petabyte range.
While these data explosion headlines have largely become numbing, the fact is that more information is not inherently better (particularly if it doesn't have value, per above). What gets fewer headlines is the “signal to noise” aspect of data management run without limits. Probably best summarized in Nate Silver's The Signal and the Noise: Why So Many Predictions Fail — but Some Don't is the notion that extraneous data noise has deleterious consequences. The book's premise comes from the electrical engineering field, where a signal is something that conveys information, while noise is an unwanted/irrelevant addition to the signal. The problem occurs (and is particularly acute regarding to big data analytics) when it's nearly impossible to tell which is which: valuable “signal” or distracting “noise.”
This signal/noise issue is the final straw that makes governance truly a near term corporate imperative. Even assuming a company is willing to roll the dice on expanding e-discovery costs, botched regulatory compliance and periodic privacy breaches (perhaps under the belief that those elements don't grow the business), workers must be able to find the right information at the right time to do their jobs. So yes, storage is relatively cheap, but the ramifications of a “store everything forever” can be quite expensive.
Myth #3 – Big data opportunities come with no costs
Despite the foregoing, it's clear that big data can have value to an organization — assuming the right data is harnessed at the right time. But even then, there is the flip side of the coin: How much does it cost to keep around terabytes of data that aren't yet being harnessed for big data analytics?
This is where the concept of information governance (IG) comes to the forefront. IG can be defined as:
“A cross-departmental framework consisting of the policies, procedures and technologies designed to optimize the value of information while simultaneously managing the risks and controlling the associated costs, which requires the coordination of e-discovery, records management and privacy/security disciplines.”
A recent AIIM study, Information governance – records, risks and retention in the litigation age, highlights the fact that senior management is ignoring the risks. The study found that 31 percent admitted their inferior electronic records keeping is causing problems with regulators and auditors, while 14 percent said they were incurring fines or bad publicity due to bad handling of information.
Here, the dark side of big data is often not counterbalanced against the potential value. E-discovery is perhaps the easiest and most tangible way to illustrate the risks and costs of keeping data. In a recent survey, the Rand corporation determined that it costs $18,000 (on average) to review a single gigabyte of content for e-discovery purposes. Given that even medium sized e-discovery cases can run in the hundreds of gigabytes, it's easy to see how this type of data, just by lying around, can and does have associated costs.
At the end of the day, good data hygiene means that organizations should keep what's valuable and purge what's not. This unassailable fact doesn't really change despite the emergence of big data. Savvy counsel is advised to watch out for these and other myths in this “keep it forever” era.
NOT FOR REPRINT
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.
You Might Like
View AllFatal Shooting of CEO Sets Off Scramble to Reassess Executive Security
5 minute readBen & Jerry’s Accuses Corporate Parent of ‘Silencing’ Support for Palestinian Rights
3 minute readShareholder Activists Poised to Pounce in 2025. Is Your Board Ready?
Regulatory Upheaval Is Coming. How Businesses Prepare and Respond Will Separate Winners and Losers
Trending Stories
Who Got The Work
Michael G. Bongiorno, Andrew Scott Dulberg and Elizabeth E. Driscoll from Wilmer Cutler Pickering Hale and Dorr have stepped in to represent Symbotic Inc., an A.I.-enabled technology platform that focuses on increasing supply chain efficiency, and other defendants in a pending shareholder derivative lawsuit. The case, filed Oct. 2 in Massachusetts District Court by the Brown Law Firm on behalf of Stephen Austen, accuses certain officers and directors of misleading investors in regard to Symbotic's potential for margin growth by failing to disclose that the company was not equipped to timely deploy its systems or manage expenses through project delays. The case, assigned to U.S. District Judge Nathaniel M. Gorton, is 1:24-cv-12522, Austen v. Cohen et al.
Who Got The Work
Edmund Polubinski and Marie Killmond of Davis Polk & Wardwell have entered appearances for data platform software development company MongoDB and other defendants in a pending shareholder derivative lawsuit. The action, filed Oct. 7 in New York Southern District Court by the Brown Law Firm, accuses the company's directors and/or officers of falsely expressing confidence in the company’s restructuring of its sales incentive plan and downplaying the severity of decreases in its upfront commitments. The case is 1:24-cv-07594, Roy v. Ittycheria et al.
Who Got The Work
Amy O. Bruchs and Kurt F. Ellison of Michael Best & Friedrich have entered appearances for Epic Systems Corp. in a pending employment discrimination lawsuit. The suit was filed Sept. 7 in Wisconsin Western District Court by Levine Eisberner LLC and Siri & Glimstad on behalf of a project manager who claims that he was wrongfully terminated after applying for a religious exemption to the defendant's COVID-19 vaccine mandate. The case, assigned to U.S. Magistrate Judge Anita Marie Boor, is 3:24-cv-00630, Secker, Nathan v. Epic Systems Corporation.
Who Got The Work
David X. Sullivan, Thomas J. Finn and Gregory A. Hall from McCarter & English have entered appearances for Sunrun Installation Services in a pending civil rights lawsuit. The complaint was filed Sept. 4 in Connecticut District Court by attorney Robert M. Berke on behalf of former employee George Edward Steins, who was arrested and charged with employing an unregistered home improvement salesperson. The complaint alleges that had Sunrun informed the Connecticut Department of Consumer Protection that the plaintiff's employment had ended in 2017 and that he no longer held Sunrun's home improvement contractor license, he would not have been hit with charges, which were dismissed in May 2024. The case, assigned to U.S. District Judge Jeffrey A. Meyer, is 3:24-cv-01423, Steins v. Sunrun, Inc. et al.
Who Got The Work
Greenberg Traurig shareholder Joshua L. Raskin has entered an appearance for boohoo.com UK Ltd. in a pending patent infringement lawsuit. The suit, filed Sept. 3 in Texas Eastern District Court by Rozier Hardt McDonough on behalf of Alto Dynamics, asserts five patents related to an online shopping platform. The case, assigned to U.S. District Judge Rodney Gilstrap, is 2:24-cv-00719, Alto Dynamics, LLC v. boohoo.com UK Limited.
Featured Firms
Law Offices of Gary Martin Hays & Associates, P.C.
(470) 294-1674
Law Offices of Mark E. Salomone
(857) 444-6468
Smith & Hassler
(713) 739-1250