Data releases
Assuming three main contexts of data release, in which each presents a different relationship, therefore a different level of trust between the data provider and data recipient, and a different level of control and risk. Internal secondary research
This is about data re-use. For example, clinical trial sponsors store and maintain vast amounts of data collected during clinical studies. The cumulative information may be invaluable for identifying patterns which are not the focus of the original trials. Sponsors are required to obtain consent for such use of patients’ data, which they may claim is not possible or entirely impractical (due to the enormous amounts of “data subjects”). The alternative is to anonymise the data such that it is no longer considered personal information. In this scenario access to data is controlled by mechanisms much like those used in primary analysis. While the requirement to de-identify the data needs to be observed, the risk of re-identification attempts is considered minimal. External secondary research
Sharing data with external researchers, under strict contracts, through secure means will supposedly ensure that the process is safe and the risks involved very low. The anonymisation process will have already considered the probability of de-anonymisation attempts – rogue employee, data breach, etc. - and taken it into account in finding and applying adequate level of anonymisation. Introducing contractual controls and limitations on how the data is accessed, used and disposed of, is thought to significantly limit the motivation of the data recipient to attempt de-anonymisation and illicit use of data. Public release
When data is released to the public domain, there is no control over how it will be used and an adversary wanting to access the data can do so with little effort. The data industry is confronted with finding it hard to assess motivations of adversaries and the level of knowledge and tools they may possess and use and that may land its data guardians in protective states of mind that may lead to significant loss of data usability. The industry therefore finds it crucial to identify plausible adversaries relevant to a context and contents of the data release which they think may result in less de-anonymisation and greater data utility retention.