Big Data Proving a Big Challenge for Legal Business Analytics

Those looking to perform big data analytics often run into a fundamental startup problem: where to find the data in the first place.

January 05, 2018 at 12:37 PM

5 minute read

By Rhys Dipshan

Data-Analytics

The use of big data analytics—in short, finding insights in extremely large data sets—has been transforming the way many U.S. industries operate. And the legal industry is no exception: From case law to M&A transactions, attorneys are now able to see the forest for the trees.

But such insights do not come easy. Collecting and analyzing enough data to provide meaningful metrics to legal professionals can be an arduous affair, especially given the challenges of amassing certain types of data in the first place.

Federal case law data, for example, is fairly easy to obtain. There is PACER, which legal tech companies have learned to expertly mine. Meanwhile, other companies have been building their own case law repositories for years and tapping them for far reaching analytics.

But with legal business data, such as legal service costs and industry-wide legal spend benchmarks, the challenges of building out big data analytics becomes more obvious. While law firms and legal departments would benefit from benchmarking data around legal services pricing, for example, they are oftentimes reluctant to disclose how much they pay or charge for their services in the first place.

“There is no reason for a private party such as a law firm or a spender on legal services, like the legal department of a big company, to give you access to your data unless you can offer some value on that,” said Ketan Jhaveri, co founder and CEO of Bodhala, a big data legal spend analytics platform.

He added that finding initial databases of information to mine “is often a problem with data businesses. They call it the 'data cold start' problem.”

For many big data analytics companies, overcoming this problem means having to invest up front to find or elicit enough data to get going. Jhaveri said, “We spent millions of dollars creating our proprietary data set” to launch with. But Bodhala didn't buy its initial data set—by its own estimation, any for-sale legal data on the market wasn't up to par anyway.

Instead, the company built search indexing software, which uses proprietary machine learning algorithms to find, collect, classify publicly-available online data on companies' finances and legal services costs.

“Whether it's from the press, whether it's what The American Lawyer reports, whether it shows up in a press release a law firm puts out, or whether it shows up an SEC filing,” the technology will find it and add it to Bodhala's database, Jhaveri said.

With this market knowledge information in hand, Bodhala found it could now also further grow its database with private data from legal departments and law firms, who would hand over their information in exchange for access to Bodhala's analytics.

To be sure, Bodhala's isn't the only legal spend analytics company in the market. Brightflag, for example, uses AI technology to automatically categorize invoices for legal departments and provide them with invoice and spend analytics. Wolters Kluwer's LegalVIEW BillAnalyzer also analyzes corporate legal invoices to find overcharges or invoices that deviate from set departmental standards. LegalView also has a database surpassing $100 billion in invoices.

Brightflag's tool, however, offers data analytics on a micro, client-by-client level. But they may be looking to expand to big data analytics down the road.

Brightflag CEO Ian Nolan, for example, told Legaltech News that while they don't currently offer market-wide legal spend metrics, “certainly the potential is to do that in the future.”

There are a few ways Brightflag could do this. The company could essentially get permission to collect and amass its clients' invoice and spend data into a big data analytics platform. Or it could go the way of Bodhala and create a proprietary own database from pulling information from public websites.

But that strategy may become harder in the future. After all, online publications and websites could start to push back on how others can index and collect the data it publishes or hosts. And for some analytics companies, that is already happening.

Social media site LinkedIn, for example, recently sought legal action to stop start up firm hiQ, which creates analytics tools for employers, from mining data from LinkedIn profiles. The action, which is ongoing, may have far reaching consequences for how internet and data analytics companies operate in the future.

“One of [complex points] the courts will have to address for people who are dependent on LinkedIn data, is that LinkedIn does allow itself to get indexed from Google because that is a driver of its traffic,” Jhaveri said. “So are there limits of taking advantage of being indexed by Google and search engines versus keeping your information away from other startups? What's the balance, what are the rules? That is something to be determined.”

Yet for the most part, Jhaveri is not too worried about the future of big data analytics. “If you're a company that is dependent on building proprietary data sets that are dependent on other companies' proprietary data sets, there is going to be a challenge. But if you're building your data sets of our primary sources, I don't think that raises a ton of issues.”

This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.

To view this content, please continue to their sites.

Go To Lexis →

Not a Lexis Subscriber?
Subscribe Now

Go To Bloomberg Law →

Not a Bloomberg Law Subscriber?
Subscribe Now

NOT FOR REPRINT

You Might Like

Latest

Trending

Who Got The Work

Michael G. Bongiorno, Andrew Scott Dulberg and Elizabeth E. Driscoll from Wilmer Cutler Pickering Hale and Dorr have stepped in to represent Symbotic Inc., an A.I.-enabled technology platform that focuses on increasing supply chain efficiency, and other defendants in a pending shareholder derivative lawsuit. The case, filed Oct. 2 in Massachusetts District Court by the Brown Law Firm on behalf of Stephen Austen, accuses certain officers and directors of misleading investors in regard to Symbotic's potential for margin growth by failing to disclose that the company was not equipped to timely deploy its systems or manage expenses through project delays. The case, assigned to U.S. District Judge Nathaniel M. Gorton, is 1:24-cv-12522, Austen v. Cohen et al.

Who Got The Work

Edmund Polubinski and Marie Killmond of Davis Polk & Wardwell have entered appearances for data platform software development company MongoDB and other defendants in a pending shareholder derivative lawsuit. The action, filed Oct. 7 in New York Southern District Court by the Brown Law Firm, accuses the company's directors and/or officers of falsely expressing confidence in the company’s restructuring of its sales incentive plan and downplaying the severity of decreases in its upfront commitments. The case is 1:24-cv-07594, Roy v. Ittycheria et al.

Who Got The Work

Amy O. Bruchs and Kurt F. Ellison of Michael Best & Friedrich have entered appearances for Epic Systems Corp. in a pending employment discrimination lawsuit. The suit was filed Sept. 7 in Wisconsin Western District Court by Levine Eisberner LLC and Siri & Glimstad on behalf of a project manager who claims that he was wrongfully terminated after applying for a religious exemption to the defendant's COVID-19 vaccine mandate. The case, assigned to U.S. Magistrate Judge Anita Marie Boor, is 3:24-cv-00630, Secker, Nathan v. Epic Systems Corporation.

Who Got The Work

David X. Sullivan, Thomas J. Finn and Gregory A. Hall from McCarter & English have entered appearances for Sunrun Installation Services in a pending civil rights lawsuit. The complaint was filed Sept. 4 in Connecticut District Court by attorney Robert M. Berke on behalf of former employee George Edward Steins, who was arrested and charged with employing an unregistered home improvement salesperson. The complaint alleges that had Sunrun informed the Connecticut Department of Consumer Protection that the plaintiff's employment had ended in 2017 and that he no longer held Sunrun's home improvement contractor license, he would not have been hit with charges, which were dismissed in May 2024. The case, assigned to U.S. District Judge Jeffrey A. Meyer, is 3:24-cv-01423, Steins v. Sunrun, Inc. et al.

Who Got The Work

Greenberg Traurig shareholder Joshua L. Raskin has entered an appearance for boohoo.com UK Ltd. in a pending patent infringement lawsuit. The suit, filed Sept. 3 in Texas Eastern District Court by Rozier Hardt McDonough on behalf of Alto Dynamics, asserts five patents related to an online shopping platform. The case, assigned to U.S. District Judge Rodney Gilstrap, is 2:24-cv-00719, Alto Dynamics, LLC v. boohoo.com UK Limited.

Learn More About Radar

Featured Firms

Law Offices of Gary Martin Hays & Associates, P.C.

(470) 294-1674

Law Offices of Mark E. Salomone

(857) 444-6468

Smith & Hassler

(713) 739-1250

Big Data Proving a Big Challenge for Legal Business Analytics

This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.

You Might Like

Who Got The Work

Who Got The Work

Who Got The Work

Who Got The Work

Who Got The Work

Featured Firms

More from ALM

Subscribe to Legal Tech News