It has not been a good year for LinkedIn, the world’s largest professional networking site which Microsoft acquired in 2016 for $26.2 billion. Before we explain why, we should note that back in 2012, a LinkedIn data breach had exposed 117 million of its users’ email addresses and passwords, at that time one of the largest-ever corporate data breaches. But this year, LinkedIn has again been in the news ─ and not in a good way ─ for two huge user data leaks (rather than internal data breaches) both resulting from sustained and systematic Web scraping campaigns that leveraged the power of bots.
In April this year, an archive of scraped data with over 500 million LinkedIn users’ email addresses, full names, phone numbers, professional titles, work-related data, physical addresses, location data, and many other pieces of personal information was uploaded by hackers and offered for sale on an underground Dark Web forum. Then on June 22, a hacker announced the sale of user data files for a total of 700 million LinkedIn members on a similar underground website. This means that over 92% of current LinkedIn users have had their data scraped and aggregated with their personal data from other sources such as analytics, marketing, and other databases. All of this user data is now available to anyone interested in buying such a large trove of personal data, or even smaller region-sized or demographically-differentiated chunks of the database (with payment expected in Bitcoin or other cryptocurrencies to anonymize the transaction).
In the underground market for personal data, some sellers even claim to verify or cross-reference various pieces of data to guarantee the accuracy of their database to potential buyers. This practice is especially acute for verified payment card data and username/ password pairs. The number of personal data points in the archive strongly suggests that data from multiple other sources were combined with scraped LinkedIn user data in order to increase the potential for abuse, and their value to cybercriminals.
In response, LinkedIn announced that “Our teams have investigated a set of alleged LinkedIn data that has been posted for sale. We want to be clear that this is not a data breach and no private LinkedIn member data was exposed. Our initial investigation has found that this data was scraped from LinkedIn and other various websites and includes the same data reported earlier this year in our April 2021 scraping update.”
LinkedIn’s Terms of Service clearly prohibit the use of “…any third party software, including “crawlers”, bots, browser plug-ins, or browser extensions (also called “add-ons”), that scrapes, modifies the appearance of, or automates activity on LinkedIn’s website.” The company has fought legal battles in the past against companies such as HiQ (a workforce analytics company), and the final United States appeals court verdict in that case had ruled that scraping a public website without its owner’s permission did not violate the Computer Fraud and Abuse Act. LinkedIn users had the option of making their profile information either public or private, and HiQ was only scraping non-private profiles. Hence data protection regulations would apply to private profiles but not to publicly-visible profile information. The court had also prohibited LinkedIn from blocking HiQ’s systematic scraping campaign during the litigation process, as it would interfere with HiQ’s contracts with its customers who relied on the scraped data.
About 1 million of the LinkedIn data files from this recent leak were provided for free to researchers who had contacted the sellers to ascertain their authenticity, and it was found that full names, email addresses, phone numbers and other data could indeed be correlated with LinkedIn users. This massive trove of user data is likely to become yet another valuable resource for cybercriminals and nefarious parties intending to carry out phishing attacks, financial fraud, account takeover, impersonation, and other forms of targeted attacks. Let’s not overlook that in April this year, 533 million Facebook users had their personal data including names, email IDs, phone numbers, birthdays, and other information hacked and distributed via underground sites that specialize in buying and selling PII (Personally Identifiable Information).
According to the privacy research and product review website that broke the news of this LinkedIn data leak, the party (or parties) that posted the scraped data archive claim to have obtained the data by exploiting an official LinkedIn API. And the asking price for all this data is reputedly just USD 5000!
While just a small percentage of LinkedIn users provide more than just the basic information required to have a profile there, and many do not provide any more than a personal email address and their real name, this information could be misused in several ways (especially since the data archive appears to contain data sourced from other databases as well). Affected users could have their LinkedIn accounts hacked, and their email addresses, phone numbers and other information could be targeted by spammers and robocallers. If they reuse the same email addresses and passwords on other websites, they could then be used in credential stuffing attacks in which bots are deployed to validate usernames and passwords on other websites and applications that they intend to attack. With enough personal data, fraudsters can subject users as well as their relatives and friends to “social engineering” attacks to gain their trust and defraud their victims in various ways.
What does all of this mean for enterprises and other types of organizations that are at similar risk of scraping attacks that carry out systematic data exfiltration? Web scraping has long been one of the earliest and most widespread tasks relegated to bots. With most organizations increasingly collecting and leveraging vast amounts of data, scraping is going to become even more of a serious threat than ever before. The exponential growth in the use of Application Programming Interfaces (APIs), the critical interconnectors that facilitate data transfers between various Web, mobile, and database systems, have made them enticing targets for attackers looking to exploit security vulnerabilities for financial gain. Even as security measures for data storage and transit keep getting better at protecting confidential data, bot technologies also keep getting better at mimicking humans. The latest 4th-generation bots now have capabilities that enable them to learn and emulate how humans use websites and mobile applications and are thus extremely difficult to detect using conventional tools such as Web Application Firewalls (WAFs), Access Control Lists (ACLs) and IP address reputation lists with known bad bot originators.
The only way that organizations can reliably and effectively secure their data from Web scraping campaigns and other types of bot attacks is to implement a dedicated bot management solution that protects their websites, mobile applications, as well as their APIs that facilitate intercommunications between various internal and external services. To learn more about how we can help your organization prevent data scraping and other harmful types of bot activities on your websites, mobile applications, and APIs, get in touch with us at firstname.lastname@example.org.