GDPR non-compliance

The General Data Protection Regulation (GDPR, 2018), which applies to all EU member states (and all organisations using EU citizens’ personal data) replaced the 1995 Directive on 25 May 2018.

  • The GDPR applies to all data, both structured and unstructured content.

  • A search engine is a “controller” of personal data.

  • Indexing information by a search engine is “processing of personal data”.

  • The GDPR is applicable, even if indexing happens outside of the EU.

GDPR compliance

In general

  • Requests must be dealt with within one month.

  • The time to respond can be extended by two months if the request is complex or the data controller has received a number of requests from the individual.

  • The GDPR does not specify how to make a valid requests. It can be verbally or in writing, it can be made to any part of an organisation (including by social media) and does not have to be to a specific person or contact point. And as long as it is clear what a person is asking for, it does not have to include phrases such as “Article 15” or “access request”.

  • When having doubts about the identity of the person making the request, a controller can ask for more information that is necessary to confirm who the person is. The key to this is proportionality, taking into account what the request is for, the nature of the held data, and what it is being used for.

Right to be informed (privacy policy)

Anyone has the right to be informed about how their data is processed. This is designed to ensure transparency over how personal data is used, and an obligation to provide “fair processing information” and includes the purposes for processing their personal data, retention periods for that personal data, and who it will be shared with.

  • This information must be provided at the time the personal data is collected. If and when personal data is obtained from other sources, individuals must be provided with the within a reasonable period of obtaining the data and no later than one month.

  • If an individual already has the information or if it would involve a disproportionate effort to provide it to them, the information does not need to be provided.

  • The information must be concise, transparent, intelligible, easily accessible, and it must use clear and plain language.

  • Typically search engines websites include this in their terms of use or privacy policy. Most private search engines are already open about their data collection and use. In most cases, it is very limited.

  • If possible, provide privacy information to people using a combination of different techniques including layering, dashboards, just-in-time notices, icons, and mobile and smart device functionalities. Test it with some real users.

  • Review and update the privacy information regularly and bring any new uses of personal data to individuals affected before processing starts.

Right of access

Users can request confirmation from search engines about whether their data is being collected and have the right to obtain the following: confirmation that their personal data is being processed, a copy of it and other supplementary information (the purposes of the processing, the categories of personal data concerned, the recipients or categories of recipient the personal data is disclosed to,retention period for storing the personal data or, where this is not possible, criteria for determining how long it will be stored, the existence of their right to request rectification, erasure or restriction or to object to such processing, the right to lodge a complaint with the ICO or another supervisory authority, information about the source of the data, where it was not obtained directly from the individual, the existence of any automated decision-making (including profiling), and the safeguards provided if personal data is transferred to a third country or international organisation).

  • Private search engines (and their users) typically have very little information to share in the first place. Internet search engines however, do.

Right to rectification

An important one. The ability to rectify incomplete or incorrect data is important in combating incorrect data profiling. Prior, user data was often out of their control and could lead to losing out on jobs, loans, etc. People can ask to have their information corrected or completed. The data gatherer is then responsible for passing the corrected information to any third-parties they previously shared data with.

  • The GDPR does not give a definition of the term accuracy. However, the Data Protection Act 2018 (DPA 2018) states that personal data is inaccurate if it is incorrect or misleading as to any matter of fact. Determining whether personal data is inaccurate can be more complex if the data refers to a mistake that has subsequently been resolved. It is also complex if the data in question records an opinion. Opinions are, by their very nature, subjective, and it can be difficult to conclude that the record of an opinion is inaccurate.

  • Under Article 18 an individual has the right to request restriction of the processing of their personal data where they contest its accuracy and a controller is checking it.

  • It is possible for a data controller to refuse to comply with a request for rectification if the request is manifestly unfounded or excessive, taking into account whether the request is repetitive in nature. In which case, the person that requested rectification must be informed without undue delay and within one month of receipt of the request about the reasons for not taking action, their right to make a complaint to the ICO or another supervisory authority and their ability to seek to enforce this right through a judicial remedy.

Right to erasure

Users have the right to request that their information is deleted, if there is no longer a reason for it to be stored. This isn’t a complete “right to be forgotten”, but it does provide protection from outdated information staying on the internet. The right to erasure leads to allowing individuals to have information, videos, or photographs about themselves deleted from certain internet records so that they cannot be found by search engines. The right is not absolute and only applies in certain circumstances, but it is not the only way in which the GDPR places an obligation on controllers to consider whether to delete personal data.

  • In the case of search engines, individuals have the right to have their personal data erased if

    • a search engine is relying on legitimate interests as its basis for processing, the individual objects to the processing of their data, and there is no overriding legitimate interest to continue this processing.

    • processing the personal data is done for direct marketing purposes and the individual objects to that processing.

  • The GDPR requires that controllers can accurately and completely identify a person’s information across the entire organisation so that the information of a natural person can be fully removed when they want the information erased. In which case a NLP based private search engine can be quite handy to have.

Right to restrict processing

The Right to Restrict Processing gives natural persons the right to request the restriction or suppression of their personal data. While the groups storing information can continue to store it, they are no longer permitted to process that data. This is an alternative to requesting the erasure of data. And although this is distinct from the right to rectification and the right to object, there are close links between those rights and the right to restrict processing because the person putting in the request can ask for restriction of processing while their other request is being considered.

  • The GDPR suggests a number of different methods that could be used to restrict data, such as temporarily moving the data to another processing system, making the data unavailable to users or temporarily removing published data from a website. Only the latter two make sense in the context of search engines.

Right to data portability

The right to data portability allows individuals to obtain and reuse their personal data for their own purposes across different services. The risk of lock-in is still the norm rather than the exception when it comes to online platforms. Many companies strive for a competitive edge by exclusively collecting and processing data and keep their systems closed. The main goal of this regulation is to give data subjects more control over their personal data and to increase user choice of online services.

  • The right to data portability only applies to personal data. This means that it does not apply to genuinely anonymous data. However, pseudonymous data that can be clearly linked back to an individual (where that individual provides the respective identifier) is within scope of the right.

  • Sometimes the personal data an individual has provided is easy to identify. In other cases less so. It includes personal data resulting from observation of an individual’s activities (like where using a device or service), history of website usage or search activities, traffic and location data, or ‘raw’ data processed by connected objects such as smart meters and wearable devices.

  • It does not include any additional data created based on the data an individual has provided. For example, if the data provided was used to create a user profile then this data would not be in scope of data portability.

  • If the requested information includes information about others (third party data) a controller needs to consider whether transmitting that data would adversely affect the rights and freedoms of such third parties.

  • Individuals have the right to ask a controller to transmit their personal data directly to another controller, without the controller putting into place legal, technical or financial obstacles which slow down or prevent the transmission of the personal data. And there may be legitimate reasons why the transmission can not be done such as that it would adversely affect the rights and freedoms of others.

  • It is recommended to provide personal data using open formats such as CSV, XML and JSON.

Right to object

Natural persons are permitted to opt-out of having their data processed. If data processors do not have a legitimate and compelling reason to process and individual’s data, users can object for any number of reasons. Search engines rarely have a legitimate and compelling reason to process individual’s data. Delivering positive user experiences does not need to come at the expense of privacy.

  • Individuals have an absolute right to object to the processing of their personal data if it is for direct marketing purposes.

  • The right to object is not absolute when the processing is for a task carried out in the public interest, the exercise of official authority or legitimate interests (or those of a third party). When processing data for scientific or historical research, or statistical purposes, the right to object is also more limited.


Ingest all structured and unstructured information from the entire corpus and search it all for Personally identifiable information (PII) using NLP techniques and tools. The presence of PII cannot be determined by a single approach, but rather by the combined effort of multiple approaches, using contexts, tags, character patterns, dictionaries, B-tree indexes, n-grams and regular expressions, oh, and training spell checkers on specific contexts. Use structured and unstructured signals with machine learning to match person records from across the multiple sources. De-anonymise, re-identify, for the purpose of being able to remove/suppress/transmit the associated data!