Thomson Reuters, ROSS Fight Highlights Potential Risk in Data Scraping Practices
While data scraping is common in legal research, many companies have to navigate around copyrighted data and proprietary platforms. Sometimes, that can be easier said than done.
May 15, 2020 at 01:03 PM
4 minute read
Last week Thomson Reuters sued legal research platform ROSS alleging that the startup, via a third party, used a bot to siphon Westlaw's copyrighted material, which it then leveraged to train its AI systems.
A day after the suit was filed, ROSS CEO and co-founder Andrew Arruda denied the allegations in a post published on Medium. While Arruda acknowledged it started working with a third party in 2017, which obtained and passed on content to help ROSS train its AI system, he said no proprietary or copyrighted information was taken.
To be sure, while collecting proprietary content without permission isn't common in the legal research industry, data scraping is, and will remain, a regular occurrence.
"Data crawling is ubiquitous in legal technology and it's ubiquitous in the internet as well," noted Rick Merrill, founder and CEO of legal analytics software Gavelytics, who spoke as an industry participant with no inside knowledge regarding the ROSS-Thomson Reuters matter.
Exiger global markets president Brandon Daniels also called data scraping "a necessity," adding that "there are ways to do it that doesn't harm the cyber posture of companies and publishers and there are ways to do it that are fair use."
Similar to general technology companies, legal research and data analytics companies have various methods to build their platforms with publicly available and copyrighted material, Merrill and Daniels said.
"In legal technology there would be two types of data collection: data collection from government sources, court websites, the Department of Justice and things of that nature. Generally speaking, crawling that type of information can be permissible," Merrill explained. He added there is "crawling private information, copyrighted or information that requires a username and password," which is a wholly different matter.
However, collecting copyrighted information or password-protected material isn't unheard of in the legal research field, providers said. Instead, a legal research company will request to use another party's proprietary content and enter into a contract documenting the permission granted and intended use, Merrill noted. Daniels also cited the 2006 Perfect 10 v. Google litigation as case law that established offering previews to copyrighted work as an infringement-free act.
"Companies take steps to collect copyrighted data in a format that is lawful, for instance allowing the user to determine what data they want to grab and acting as a passive conduit software, very similar to what Google does," he said.
For a more direct approach, many legal research platforms also purchase subscriptions from the content provider. "We have contracts with companies that essentially allows us access to copyrighted content and we pay them for it, for the use case and the use of [the content by] our customers and they get the advantage to keep the commercial right of their copyrightable material," Daniels said.
Still, some legal research companies may restrict competitors' access to their platforms. In its complaint against ROSS, for instance, Thomson Reuters noted that it prevented ROSS from having a Westlaw account.
To be sure, while there are data collecting best practices in the industry, companies building their legal research and analytics programs can run into risks, Daniels noted.
"If you're building your legal research and legal analytics capabilities by, for instance, scraping websites or scraping content providers in a way that you shut down their website and create a cyber problem then yes, you do [have a risk]. That's not something you can do."
Companies can also run into legal risks with certain collection practices. For example, "If you're taking content and republishing it as your own without a link to the original or citation to the original content or not transforming it some type of way," Daniels added.
However, Merrill argued if legal analytics platforms follow industry standards for data scraping, they shouldn't run into much issues.
"I think if those types of companies are careful and thoughtful about what they're doing, it's minimal litigation risk. If people try to cut corners, then there's litigation risk," he said.
This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.
To view this content, please continue to their sites.
Not a Lexis Subscriber?
Subscribe Now
Not a Bloomberg Law Subscriber?
Subscribe Now
NOT FOR REPRINT
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.
You Might Like
View AllTrending Stories
- 1Call for Nominations: Elite Trial Lawyers 2025
- 2Senate Judiciary Dems Release Report on Supreme Court Ethics
- 3Senate Confirms Last 2 of Biden's California Judicial Nominees
- 4Morrison & Foerster Doles Out Year-End and Special Bonuses, Raises Base Compensation for Associates
- 5Tom Girardi to Surrender to Federal Authorities on Jan. 7
Who Got The Work
Michael G. Bongiorno, Andrew Scott Dulberg and Elizabeth E. Driscoll from Wilmer Cutler Pickering Hale and Dorr have stepped in to represent Symbotic Inc., an A.I.-enabled technology platform that focuses on increasing supply chain efficiency, and other defendants in a pending shareholder derivative lawsuit. The case, filed Oct. 2 in Massachusetts District Court by the Brown Law Firm on behalf of Stephen Austen, accuses certain officers and directors of misleading investors in regard to Symbotic's potential for margin growth by failing to disclose that the company was not equipped to timely deploy its systems or manage expenses through project delays. The case, assigned to U.S. District Judge Nathaniel M. Gorton, is 1:24-cv-12522, Austen v. Cohen et al.
Who Got The Work
Edmund Polubinski and Marie Killmond of Davis Polk & Wardwell have entered appearances for data platform software development company MongoDB and other defendants in a pending shareholder derivative lawsuit. The action, filed Oct. 7 in New York Southern District Court by the Brown Law Firm, accuses the company's directors and/or officers of falsely expressing confidence in the company’s restructuring of its sales incentive plan and downplaying the severity of decreases in its upfront commitments. The case is 1:24-cv-07594, Roy v. Ittycheria et al.
Who Got The Work
Amy O. Bruchs and Kurt F. Ellison of Michael Best & Friedrich have entered appearances for Epic Systems Corp. in a pending employment discrimination lawsuit. The suit was filed Sept. 7 in Wisconsin Western District Court by Levine Eisberner LLC and Siri & Glimstad on behalf of a project manager who claims that he was wrongfully terminated after applying for a religious exemption to the defendant's COVID-19 vaccine mandate. The case, assigned to U.S. Magistrate Judge Anita Marie Boor, is 3:24-cv-00630, Secker, Nathan v. Epic Systems Corporation.
Who Got The Work
David X. Sullivan, Thomas J. Finn and Gregory A. Hall from McCarter & English have entered appearances for Sunrun Installation Services in a pending civil rights lawsuit. The complaint was filed Sept. 4 in Connecticut District Court by attorney Robert M. Berke on behalf of former employee George Edward Steins, who was arrested and charged with employing an unregistered home improvement salesperson. The complaint alleges that had Sunrun informed the Connecticut Department of Consumer Protection that the plaintiff's employment had ended in 2017 and that he no longer held Sunrun's home improvement contractor license, he would not have been hit with charges, which were dismissed in May 2024. The case, assigned to U.S. District Judge Jeffrey A. Meyer, is 3:24-cv-01423, Steins v. Sunrun, Inc. et al.
Who Got The Work
Greenberg Traurig shareholder Joshua L. Raskin has entered an appearance for boohoo.com UK Ltd. in a pending patent infringement lawsuit. The suit, filed Sept. 3 in Texas Eastern District Court by Rozier Hardt McDonough on behalf of Alto Dynamics, asserts five patents related to an online shopping platform. The case, assigned to U.S. District Judge Rodney Gilstrap, is 2:24-cv-00719, Alto Dynamics, LLC v. boohoo.com UK Limited.
Featured Firms
Law Offices of Gary Martin Hays & Associates, P.C.
(470) 294-1674
Law Offices of Mark E. Salomone
(857) 444-6468
Smith & Hassler
(713) 739-1250