In this blog, we will present a general overview of browser fingerprinting in the context of bot detection. Effective bot detection relies on being able to view a ‘fingerprint’ consisting of many pieces of information obtained from a browser in order to identify it and detect whether it is being used by a bot or a human. The first generation of user identification and tracking techniques were very simple, using a few identifiers such as cookies to recognize visitors. As browsers became more sophisticated, browser fingerprinting has emerged not only as a user identification and tracking technique used by various marketing and analytics firms, but also for specialized bot detection solutions such as Radware Bot Manager.
In 2012, researchers at UC San Diego wrote a paper titled ‘Pixel Perfect: Fingerprinting Canvas in HTML5’ describing how the HTML5 canvas could be used to create digital fingerprints of web users. In the last few years, this form of ‘Canvas Fingerprinting’ became widespread, enabled by widely-adopted HTML5 browser standards that permitted the use of server-side commands to request a user’s browser to send specific information for the purpose of browser fingerprinting. Bot detection technologies have thus evolved to leverage the data accessible from a visitor’s browser for analysis through specialized algorithms to accurately identify the nature of a visitor to a website, application, or API.
The Key Elements in a Browser Fingerprint
Some of the key elements that can be used to fingerprint a browser are:
- System platform (e.g. Win32, Linux x86)
- System language (e.g. en-US)
- Screen resolution and color depth
- System Time Zone
- Installed browser extensions and plugins such as QuickTime, Flash, Java or Acrobat and their versions
- Fonts installed on the computer, as reported by Java or Flash
- Yes/no information stating whether the browser accepts various kinds of cookies and ‘super-cookies’
- A hash of the image generated by canvas fingerprinting
- A hash of the image generated by WebGL fingerprinting
- Whether the browser is sending the Do Not Track header (Y/N)
- The browser’s touchscreen support
Browser Fingerprinting Methodologies
Commonly-used general-purpose browser fingerprinting solutions work by analyzing information such as the elements mentioned above. According to the Electronic Frontier Foundation, “We observe that the distribution of our finger-print contains at least 18.1 bits of entropy, meaning that if we pick a browser at random, at best we expect that only one in 286,777 other browsers will share its fingerprint. Among browsers that support Flash or Java, the situation is worse, with the average browser carrying at least 18.8 bits of identifying information. 94.2% of browsers with Flash or Java were unique in our sample. By observing returning visitors, we estimate how rapidly browser fingerprints might change over time. In our sample, fingerprints changed quite rapidly, but even a simple heuristic was usually able to guess when a fingerprint was an “upgraded” version of a previously observed browser’s fingerprint, with 99.1% of guesses correct and a false positive rate of only 0.86%.”
On desktop and mobile browsers, fingerprinting techniques are used to detect bad bots and ascertain whether the device is a mobile emulator, and whether its screen orientation, touch and usage patterns or any other specific characteristics match that of a human using a mobile device. Bot detection systems also analyze whether there is a mismatch between the OS and browser versions (for example, it is impossible to have a human using the latest version of Edge browser in Windows 7 or older OS).
Browser fingerprinting is not just a way to identify suspected bot traffic but can also provide ways to analyze visitor data, aid in detection of ad and payment fraud, as well as developing omni-channel marketing campaigns, and much more. With a focus on detecting bots, solutions such as Radware Bot Manager do not rely on browser fingerprinting alone but also leverage other methodologies such as semi-supervised machine learning approaches and intent analysis to ascertain the intent of every visitor.