Blog

What is Web Scraping and Why Should Employers Be Concerned?

By Heather J. Van Meter - Bullard Law

May 17, 2022

The Ninth Circuit Court of Appeals recently ruled in LinkedIn v. hiQ Labs, Inc. that “web scraping” is likely not illegal and allowed web scraping company hiQ Labs, Inc. to “web scrape” LinkedIn’s website, including all of LinkedIn’s substantial publicly-available member profile data. Following the U.S. Supreme Court’s earlier ruling in Van Buren v. United States, the Ninth Circuit rejected LinkedIn’s request for a preliminary injunction preventing hiQ Labs from scraping the LinkedIn website’s public data and using the data for hiQ’s own or its clients’ uses. The Ninth Circuit held that such activity is likely allowed under the federal Consumer Fraud and Abuse Act, which does not prevent the use of publicly-available information on websites.

LinkedIn reports it has over 60 million users worldwide, and LinkedIn member profiles often include detailed histories of education and work experience of its members, as well as group memberships, links to articles, and other individualized data. This means LinkedIn’s 60+ million members globally can have all public information that is on member profiles, including education and work history and contact information, gathered and copied, then added to databases and spreadsheets and used or sold to other companies. As a result, LinkedIn members face a conundrum of leaving LinkedIn to protect their privacy (which is probably already lost!) versus continuing to use LinkedIn for the benefits of the site’s networking, promoting personal or company goods and services, job offerings and searching, and affiliations with potential employers or clubs or charities, among other benefits.

Most web scraping is now done by computers programmed to search and scrape data from specific company websites (such as company competitors), specific industries, or targeted data sets such as product model numbers and pricing information. Individuals, companies, and other groups can program web scraping themselves, but most hire special web scraping companies to run computer programs to obtain the desired data, put it into a spreadsheet or other requested format, and then use the formatted data sets for any intended purposes. All data that is publicly available on websites appears to be fair game for web scraping. The process is akin to the old-school use of white and yellow pages data, but with more massive amounts of specific information available for scraping.

Why Should HR Professionals Be Concerned?

HR professionals should consider what information is publicly available on or through their employer’s website and whether that information should be available to the public and competitors, employee poachers, scammers, fraudsters, and all others. For example, many company websites contain direct-dial phone numbers and email addresses for some or all employees, leaving employees open to constant contact, phishing and hacking scams, and harassment. For further example, company websites may contain links to human resources forms for employees to complete and submit, such as timecards or beneficiary designations, or payroll direct deposit forms, but if the forms and employee information are publicly available on the employer’s website, the forms could be downloaded, populated with employee information from the website (or combined with other public information on the employee) and submitted through the company’s website. Without proper controls, employees could be subjected to theft or fraud. Furthermore, the employer could be held legally responsible for the employee’s losses due to a lack of website controls. Although insurance may be available for some types of litigation liability from web scraping, many insurance policies do not presently cover such potential liabilities.

HR professionals must consider how users interact with the employer’s website. Some job applicants apply through a website, but depending on how the application process is set up, the job applicant information may be publicly available and subject to web scraping. Existing employees may even post information that becomes publicly available through an employer’s website, depending on how an intranet or instant messaging system is configured. For example, event or meeting schedules may be posted on websites, making the date, location, and topics open to web scraping.

HR professionals should also consider what information employees have publicly available on other websites that could reflect on the employer or put the employer or employee at risk. If a list of employees with location and contact information is on the website, even if deep into the website, it will be found by data scrapers. For further example, if an employee’s name, photo, direct phone number, and direct work email address are on the company’s website, and the employee also has a public FaceBook page with the employee’s personal address or phone number or other personal data (involvement in an extremist group perhaps?), the information can be combined to target the employee for anything from solicitation for other jobs, to harassment day and night, or even extortion.

Conversely, employers should consider what public information is available on employees that, with web scraping and also readily available facial recognition software, an employer could be targeted for employing employees who are ex-convicts or attend certain events or associate with “out of favor” groups – which could be anything from a particular religion to Anarchists to Communists to Planned Parenthood supporters to the Proud Boys and KKK. Taking the recent tragic racially-motivated shooting in Pittsburgh as an example, an employer could even be targeted if it touts on its website that it has a majority of African-American employees or is Asian or Muslim or Jewish-owned.

Private and Public Employers Generally Need to be Aware of “Web Scraping”

While web scraping can affect human resources activities, employers generally should be aware that all public content on the employer’s website is fair game for scraping and use. For example, a tractor sales company with new and used tractors for sale on its website could be scraped for model numbers, prices, and then pricing undercut or even sales directed to other competitor websites – this occurs with house and car listings already. Phishing, hacking, other scams and even ransomware can result from web scraping. If customer inquiry or sales data is input through a website, this information could also be web scraped depending on how it is input.

Web scraping can also be used to attack an employer competitively in other ways, such as by reducing a company’s SEO ratings, finding embedded words and coding used to promote the employer’s website, and using it against the employer competitively, learning of new or ongoing projects referenced on the employer’s website, or other trade secret or competitive data.

Public employers are not immune from harm by web scraping. Indeed, public employers may be more at risk because minutes of public meetings, video links to public hearings, and large amounts of public information are often accessible through a public employer’s website. For example, if a public meeting discussed an employee issue or discipline, and the meeting video or minutes are linked to the website, web scraping can find the information for use and abuse. Likewise, if a public meeting is videotaped, but the video starts early, ends late, or is not turned off during the executive session, it is accessible by web scraping if it is linked to the website. Although public records and meetings laws typically make large amounts of information publicly available, the public records requests process often creates a layer of protection or review before the information is released, whereas web scraping has no such layers of protection or review.

Possible Protective Actions to Consider

Some technical actions can be taken to protect against web scraping. “Captcha” or similar software programs can be used on a website before access to content is possible. Web scraping companies typically use a technique involving multiple “requests” from the same IP web address, which can be identified as a web scraping attempt and stopped by the employer or employer’s IT department or IT security provider.

To protect against excessive contact, hacking, phishing, and other scams using employee direct email addresses and phone numbers, HR professionals should consider using one or a few centralized phone numbers and one or a few centralized email addresses on the website.

HR professionals should take the time to review all employer website content from a “risk exposure” mindset, consider what information could put employees or the employer at risk, and whether there are less risky means of presenting the necessary information on the website. For example, online payroll direct deposit forms could be shifted to a password-protected intranet location, or steps should be taken to ensure wet signatures on such forms that are verified and confirmed with the employee in person.

HR professionals, and employers generally, should consider obtaining consent from each employee before information about that employee (including photos and contact information) is included on the employer’s website. Employee photos can be obtained from web scraping and then combined with facial recognition software to create many risks and problems for the employee and employer. Bullard Law can assist employers with employee website content forms.

Although HR professionals may also wish to review other publicly-available information on employees, extreme caution is in order in this process because state and federal laws protect employee information and its use in the employment context. If an employer learns of political views, questionable personal FaceBook page content, or other information, the employer may not be able to collect or use this information. For specific information on what employers can and cannot do, contact Bullard Law. 

www.bullardlaw.com

Tweets Follow

We are having a problem with our Twitter Feed right now.