Data masking: Anonymisation or pseudonymisation?

2 Comments

Among the arsenal of IT security techniques available - data masking: pseudonymisation or anonymisation is highly recommended by the GDPR regulation. Such techniques reduce risk and assist “data processors” in fulfilling their data compliance regulations.

Data masking hacker

If it can be proven that the true identity of the individual cannot be derived from anonymised data, then this data is exempt from other methods ensuring the strict confidentiality of the actual data.

The two techniques differ and in face of the GDPR the choice will depend on the degree of risk and how the data will be processed.

What is pseudonymisation?

Pseudonymisation enhances privacy by replacing most identifying fields within a data record by one or more artificial identifiers, or pseudonyms. There can be a single pseudonym for a collection of replaced fields or a pseudonym per replaced field.

Specifically, the GDPR defines pseudonymization in Article 3, as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” To pseudonymise a data set, the “additional information” must be “kept separately and subject to technical and organisational measures to ensure non-attribution to an identified or identifiable person.”

Data masking: Pseudonymisation or anonymisation?

The legal distinction between anonymised and pseudonymised data is its categorisation as personal data. Pseudonymous data still allows for some form of re-identification (even indirect and remote), while anonymous data cannot be re-identified.

Pseudonymisation techniques differ from anonymisation techniques. With anonymisation, the data is scrubbed for any information that may serve as an identifier of a data subject. Pseudonymisation does not remove all identifying information from the data but merely reduces the linkability of a dataset with the original identity of an individual (e.g., via an encryption scheme).

Both pseudonymisation and anonymization are encouraged in the GDPR and enable its constraints to be met. These techniques should, therefore, be generalised and recurring. Those in possession of personal data should implement one or other of these techniques to minimise risk, and automation can reduce the cost of compliance.

Which data should be anonymised?

By definition, data anonymization techniques seek to conceal identity and thus identifiers of any nature. Identifiers can apply to any natural or legal person, living or dead, including their dependents, ascendants and descendants. Included are other related persons, direct or through interaction.

For example:

Family names, patronyms, first names, maiden names, aliases
Postal addresses, telephone numbers, postal codes and cities
IDs: social security number (e.g. fiscal code in Italy, National Insurance number in UK), bank account details (e.g. IBAN), credit card numbers, valid keys, partial anonymisation.

Which techniques are available for anonymising data?

A variety of methods are available and again the choice will depend on the degree of risk and the intended use of the data.

Directory replacement

A directory replacement method involves modifying the name of individuals integrated within the data, while maintaining consistency between values, such as “postcode + city”.

Scrambling

Scrambling techniques involve a mixing or obfuscation of letters. The process can sometimes be reversible. For example: Annecy could become Yneanc

Masking

A masking technique allows a part of the data to be hidden with random characters or other data. For example: Pseudonymisation with masking of identities or important identifiers. The advantage of masking is the ability to identify data without manipulating actual identities.

Personalised anonymisation

This method allows the user to utilise his own anonymisation technique. Custom anonymisation can be carried out using scripts or an application.

Blurring

Data blurring uses an approximation of data values to render their meaning obsolete and/or render the identification of individuals impossible.

Data masking versus data encryption: A comparison of two pseudonymisation methods

Distinct from data masking, data encryption translates data into another form, or code, so that only people with access to a secret key (formally called a decryption key) or password can read it.

Data masking is a more widely applicable solution as it enables organizations to maintain the usability of their customer data.

Security of data during transfer – Data masking? No. Encryption? Yes.
Security of static data – Data masking? Yes. Encryption? Yes.
Continuous availability of data for applications – Data masking? Yes. Encryption? No.

Data masking is the standard solution for data pseudonymisation. Using masking, data can be de-identified and de-sensitised so that personal information remains anonymous in the context of support, analytics, testing, or outsourcing.

By Olenka Van Schendel, vice president of strategic marketing & business development at Arcad Software.