National law enforcement and intelligence agencies are considered to be among the most likely adversaries to have the motivation to attempt a re-identification, and are considered to have the necessary tools.
The “many eyes” (5 and more eyes) and law enforcers use information from the advertising ecosystem as a cheap and easy way of tracking individual users across multiple devices, locations and accounts. The adoption of these new tools does not come as a surprise given their low cost compared to other forms of surveillance. And seems to also be a response to public and political pressure to investigate crimes online, especially when after a high profile crime, journalists unearth social media profiles full of warning signs.
Law enforcement uses social media in many ways, including:
Browsing social media
Creating accounts for sharing information with law enforcement
Obtaining information from social media companies
Creating fake profiles and personas
Commonly used data mining techniques are:
Entity extraction to automatically identify people, organisations, vehicles and personal details in unstructured data such as police reports. Even if entity extraction provides only basic information, it can be used as auxiliary information to accelerate the investigation by rapidly providing precise details from large amounts of unstructured data (from social media such as Instagram, Facebook, and Twitter, a structural de-anonymisation attack leading to identity disclosure and link disclosures. This also means activity of particular people can be monitored, a content disclosure threat, see the last point in this list).
Clustering techniques that are used to group similar characteristics together in classes in order to gain intelligence by maximizing or minimizing similarities; for example, to identify suspects or criminal groups conducting crimes in similar ways. Clustering techniques can be applied to discover criminal relations by cross-referencing entities in criminal records.
Association rules are used to discover recurring items in databases in order to create pattern rules and detect potential future events. For example, sequential pattern mining as an association rule is useful to identify sequences or recurring item in order to define patterns and prevent attacks, in network security.
Classification for analysing unstructured data to discover common properties among criminal entities. It has been used together with inferential statistics techniques to predict crime trends. This technique can dramatically narrow down different criminal entities and organise them into predefined classes.
String comparison is used to reveal deceptive information in criminal records by comparing structured text fields. This requires highly intensive computational capabilities.
Text mining techniques were considered to be the next step in the evolution of data mining and criminal intelligence technologies in 2016.
Law enforcement agencies (and intelligence agencies) claim that this is an inexpensive strategy with little impact on people’s privacy because it relies only on so-called publicly available information. A tweet is considered not private because, by its nature, you cannot control its audience. Does that automatically make it public, or within the space of the police? Both Evanna Hu and Millie Graham Wood make a case by saying that social media do not easily fit into either the category of public or private and argue that it is instead a pseudo-private space, where there is an expectation of privacy from the state.