What’s the attack?
A deanonymisation attack is when seemingly incidental data is used to infer someone’s identity online.
Imagine an authoritarian government has a zero day that they want to deploy to a journalist’s computer. They have their state news website all ready to go to deliver this exploit. But, they can’t just infect every visitor to their site as it would raise too much suspicion.
No, they want to deliver it only to one specific journalist.
But they only have the journalist’s email address.
How can they know when that journalist is browsing their site, so they can deliver the exploit to them, and only them?
With a deanonymisation attack.
The government shares a resource with the journalist on something like Dropbox, Youtube, TikTok etc via their email address. Then they embed the resource (let’s say a Youtube video) in the state news website. Only the journalist has permission to view this resource, so it will only load on their browser, and nobody else’s.
They lure the journalist to the site, and the YouTube video plays in their browser. So far, so good. So how do they deanonymise the journalist?
The site they visited and hardware they were using were completely free from any vulnerability in the traditional sense of the word.
How is the cache probed?
A variation of the Prime + Probe technique is used. This targets the Last Level Cache which is generally shared between all CPU cores. The steps are:
- the attacker fills the victim’s CPU cache with data, setting to a known state (Prime)
- they wait for a bit (‘meantime’)
- the attacker reloads their data on the victim’s machine and measures how long it takes (Probe)
If their data loads quickly, they know the victim didn’t access anything in the meantime as it was still in cache. If it takes longer, they know that the victim accessed data in the meantime, pushing the attacker’s data out of cache.
In our scenario, the attacker knows that only the victim can load the YouTube video, so they can use the probe results to infer that users' identity.
The attack relies on a machine learning model to distinguish the target’s cache activity from everyone else’s. So, the attack consists of two phases: a training phase and an online phase.
The attacker trains a machine learning classifier to identify cache activity of a visitor who loaded the video. The researchers say you can train a solid classifier on a dataset of 200 cache samples (100 target and 100 non-target). They used logistic regression for single target attacks and LSTM for multi-target attacks.
The attacker prepares a resource (the researchers used Gmail, YouTube, Google Drive, Twitter, Facebook etc.) and shares it with only the victim (the opposite can also be done, where you deny access to only the victim).
Their attack page features a script that:
- initiates cache measurement
- loads the resource in an
<iframe>HTML element, appearing in a new tab or browser window.
- records cache activity for a few seconds (around 3 is enough)
- closes the resource
- sends the cache trace to the attacker’s server
The trained classifier can then identify the victim by the trace.
The researchers even found a way to scale this to multiple targets by using YouTube playlists - sharing different videos with different targets.
The researchers propose a browser plugin, Leakuidator+ that keeps track of which popup windows and tabs were created by which sites by monitoring the
webNavigation API and tracing parent and child relationships.
If a popup window/tab’s domain is different from the calling page, it applies a defence involving cookie stripping and introducing a delay to loading shared resources. This delay would be enough to invalidate any deanonymisation attempts via cache timing measurements.