Last week Thomson Reuters sued legal research platform ROSS alleging that the startup, via a third party, used a bot to siphon Westlaw's copyrighted material, which it then leveraged to train its AI systems.

A day after the suit was filed, ROSS CEO and co-founder Andrew Arruda denied the allegations in a post published on Medium. While Arruda acknowledged it started working with a third party in 2017, which obtained and passed on content to help ROSS train its AI system, he said no proprietary or copyrighted information was taken.

To be sure, while collecting proprietary content without permission isn't common in the legal research industry, data scraping is, and will remain, a regular occurrence. 

"Data crawling is ubiquitous in legal technology and it's ubiquitous in the internet as well," noted Rick Merrill, founder and CEO of legal analytics software Gavelytics, who spoke as an industry participant with no inside knowledge regarding the ROSS-Thomson Reuters matter.

Exiger global markets president Brandon Daniels also called data scraping "a necessity," adding that "there are ways to do it that doesn't harm the cyber posture of companies and publishers and there are ways to do it that are fair use."

Similar to general technology companies, legal research and data analytics companies have various methods to build their platforms with publicly available and copyrighted material, Merrill and Daniels said.

"In legal technology there would be two types of data collection: data collection from government sources, court websites, the Department of Justice and things of that nature. Generally speaking, crawling that type of information can be permissible," Merrill explained. He added there is "crawling private information, copyrighted or information that requires a username and password," which is a wholly different matter.

However, collecting copyrighted information or password-protected material isn't unheard of in the legal research field, providers said. Instead, a legal research company will request to use another party's proprietary content and enter into a contract documenting the permission granted and intended use, Merrill noted. Daniels also cited the 2006 Perfect 10 v. Google litigation as case law that established offering previews to copyrighted work as an infringement-free act.

"Companies take steps to collect copyrighted data in a format that is lawful, for instance allowing the user to determine what data they want to grab and acting as a passive conduit software, very similar to what Google does," he said.

For a more direct approach, many legal research platforms also purchase subscriptions from the content provider. "We have contracts with companies that essentially allows us access to copyrighted content and we pay them for it, for the use case and the use of [the content by] our customers and they get the advantage to keep the commercial right of their copyrightable material," Daniels said.

Still, some legal research companies may restrict competitors' access to their platforms. In its complaint against ROSS, for instance, Thomson Reuters noted that it prevented ROSS from having a Westlaw account.

To be sure, while there are data collecting best practices in the industry, companies building their legal research and analytics programs can run into risks, Daniels noted.

"If you're building your legal research and legal analytics capabilities by, for instance, scraping websites or scraping content providers in a way that you shut down their website and create a cyber problem then yes, you do [have a risk]. That's not something you can do."

Companies can also run into legal risks with certain collection practices. For example, "If you're taking content and republishing it as your own without a link to the original or citation to the original content or not transforming it some type of way," Daniels added.

However, Merrill argued if legal analytics platforms follow industry standards for data scraping, they shouldn't run into much issues.

"I think if those types of companies are careful and thoughtful about what they're doing, it's minimal litigation risk. If people try to cut corners, then there's litigation risk," he said.