Websites offer a substantial amount of information to their customers, subscribers and often the general public. Some websites, however require visitors to either provide personal information, set up a password and account and/or pay a fee for the access to specific, more coveted information.

Certain visitors to these websites, such as the site's competitors, may want to obtain this information through methods such as data scraping (i.e., the extraction of data from a website through the use of a computer program) but may not wish to disclose their identity. These visitors, therefore, use false names and other fictitious information to establish accounts through which they can access all the data on the website. This practice will usually violate the terms of service of the website.

Does it also violate the Computer Fraud and Abuse Act (CFAA), which prohibits the accessing of a computer without authorization or in excess of authorization? There is scarcity of case law concerning the issue. However, two recent federal district court decisions provide some guidance concerning whether the use of sham information to establish accounts for the purpose of data scraping violates the CFAA.

|

hiQ Labs v. LinkedIn

In hiQ Labs, Inc. v. LinkedIn Corp., hiQ Labs, which provides businesses with information on their workforces based upon statistical analysis of publicly available employee data sought a preliminary injunction to prevent LinkedIn from blocking hiQ's access to users' public profiles on LinkedIn's site. After apparently tolerating years of hiQ's access and use of the public profiles, LinkedIn sent a cease and desist order to hiQ, threatening action under the CFAA and other theories of civil liability. In addition to seeking a preliminary injunction, hiQ sought a declaration from the district court that it had not and would not violate the CFAA by accessing LinkedIn's public profiles.

The district court granted hiQ's request for a preliminary injunction, finding that hiQ would suffer immediate and irreparable harm because its business depended on access to information on LinkedIn's site that its users had expressly made public, and that hiQ had a likelihood of success on the merits. In ruling in hiQ's favor, the district court found a substantial likelihood that it would prevail in showing that its access to public profiles was not unauthorized access to LinkedIn in violation of the CFAA, even after LinkedIn had notified hiQ that it was no longer permitted to access its site.

In reaching this decision, the district court distinguished the Ninth Circuit's CFAA decisions in Facebook, Inc. v. Power Ventures, Inc. and United States v. Nosal. In Power Ventures, the Ninth Circuit held that a party that accessed a website after permission had been explicitly revoked and did so by circumventing IP barriers violated the CFAA. Similarly, in Nosal, the Ninth Circuit held that a defendant whose login credentials were revoked but who used the credentials of a current employee to access the computer system of his former employer did so without authorization in violation of the CFAA. The hiQ court observed:

Each of these cases is distinguishable in an important respect: none of the data in [either case] was public data. Rather, the defendants in those case gained access to a computer network . . . and a portion of a website . . . that were protected by a password authentication system. In short, the unauthorized intruders reached into what would fairly be characterized as the private interior of a computer system.

These scenarios, the hiQ court found, were materially different from hiQ's access to LinkedIn user profiles that the users themselves had made public. It found that “[t]he CFAA was not intended to police traffic to publicly available websites on the Internet,” and that the statue “was intended instead to deal with 'hacking' or 'trespass' onto password-protected mainframe computers.”

The hiQ court drew a similar distinction between the use of an automated program to scrape publicly available data on a website and the use of same program to scrape data accessible only with a password. It held that “[a] user does not 'access' a computer 'without authorization' by using bots [to scrape data], even in the face of technical countermeasures, when the data it accesses is otherwise open to the public.” In contrast, the hiQ court noted: “Where a website or computer owner has imposed a password authentication system to regulate access, it makes sense to apply a plain meaning of 'access,' 'without authorization' such that a defendant can run afoul of the CFAA when he or she has no permission to access a computer or when such permission has been revoked explicitly.” The hiQ decision suggests that, using a false name or other fictitious personal information to access and scrape publicly available data on a website does not violate the CFAA; whereas, employing similar fraudulent techniques to access and scrape password protected data may contravene the CFAA.

|

Sandvig v. Sessions

A decision by the district court in the District of Columbia reinforces the distinction recognized by the hiQ court. In Sandvig v. Sessions, the plaintiffs, a group of professors and a media organization, wanted to conduct research into whether the use of algorithms to automate decisions by different employment and housing websites resulted in discriminatory outcomes. The plaintiffs wanted to conduct audit tests by creating false user profiles to measure how these websites processed their information and deploying bots to scrape data from these websites. Because these activities would violate the terms of service for the websites, the plaintiffs alleged in their complaint that they had to either refrain from conducting their testing and research or subject themselves to criminal prosecution over the Access provisions of the CFAA (which prohibited accessing a website “without authorization”). Thus, the plaintiffs challenged the Access provisions of the CFAA as violating their First Amendment rights.

The Sandvig court held that the plaintiffs had standing and allowed their claim that the Access provisions of the CFAA “as applied” to plaintiffs unconstitutionally restrict their protected speech. In reaching this conclusion, the district court examined whether there was a credible risk of criminal prosecution against plaintiffs for their proposed conduct. The Sandvig court found that there was a circuit split concerning what activity constituted “without authorization” under the CFAA. The Second, Fourth and Ninth Circuit adopted a more narrow interpretation and held that this language prohibits only unauthorized access to information. In contrast, the First, Fifth and Eleventh Circuits have adopted a more expansive interpretation and held that it also covers unauthorized use of information that a defendant was authorized to access only for specific purposes. The district court held the narrow interpretation to be the “best reading of the statute.”

Utilizing the narrow interpretation, the Sandvig court found that “most of the plaintiffs' proposed activities fall outside the CFAA's reach” and that “[s]craping or otherwise recording data from a site that is accessible to the public is merely a particular use of information that plaintiffs are entitled to see.” The court continued: “Employing a bot to crawl a website or apply for jobs may run afoul of a website's ToS, but it does not constitute an access violation when the human who creates the bot is otherwise allowed to read and interact with that site” as “bots are simply technological tools for humans to more efficiently collect and process information that they could otherwise access manually.” However, the Sandvig court did not condone all of plaintiffs' activities, noting that “only [plaintiffs'] plan to create fictitious user accounts on employment would violate the CFAA” because “[u]nlike plaintiffs' other conduct, which occurs on portions of websites that any visitor can view, creating false accounts allows [plaintiffs] to access information on these sites that is both limited to those who meet the owners' chosen authentication requirements and targeted to the particular preferences of the user.”

Taken together, the hiQ and Sandvig decisions appear to offer some guidance concerning how courts will treat data scraping. Both decisions seem to stand for the proposition that scraping publicly available data off websites is permissible under the CFAA, even if the scraping is accomplished through technological means, such as bots, and violates the website's terms of service. However, both decisions also seem to draw a line when visitors attempt to bypass an authentication system, such as a password gate, to access data that is not available to the general public. Under those circumstances, the decisions seem to indicate that such conduct constitutes access “without authorization” or “in excess of authorization” and runs afoul of the CFAA. Thus, using false names or fictitious personal information to establish accounts to bypass password safeguards would violate the Access provisions of the CFAA under these two cases.

Although the hiQ and Sandvig decisions appear to condone the scraping of publicly available data from websites, data scrapers and aggregators should exercise caution. The hiQ decision not only limits its holding to the appropriateness of injunctive relief in that specific factual context but is also currently on appeal before the Ninth Circuit. Moreover, the Sandvig decision also possesses unique circumstances. The plaintiffs in Sandvig were scraping data for a non-competitive, academic purpose rather than for a commercial purpose. Courts may reach a different result if the data scrapers were doing so for commercial gain or advantage. Despite these caveats, the hiQ and Sandvig decisions reflect a positive trend for data scrapers.

Hanley Chew is Of Counsel in the Litigation Group with Fenwick & West. He focuses his practice on privacy and data security litigation, counseling and investigations, as well as intellectual property and commercial disputes affecting high technology and data driven companies.