Skip to main content
Skip to secondary navigation
Menu
Australian Government - Office of the Australian Information Commissioner - Home

Information Policy Agency Resource 1 — De-identification of data and information - consultation draft April 2013

An agency’s information is one of its most valuable assets. The Freedom of Information Act 1982 declares that government information is a national resource that should be managed for public purposes.[1]

Releasing agency information and data (information assets) for reuse by the public can:

  • strengthen government transparency
  • assist public participation in government processes
  • drive innovation and economic growth.
  • The OAIC Principles on open public sector information[2] support an open government/open data culture, by providing guidance for agencies on how to manage government information assets as a national resource for public purposes.

Agencies must also observe obligations imposed by the Privacy Act 1988 in handling personal information. In particular, the Act requires agencies to take reasonable steps in some circumstances to destroy or de-identify personal information. This is particularly important where information assets that are proposed to be published or shared with others contain personal information. De-identifying that information can maximise the utility and value of the information assets without compromising privacy or confidentiality.

The obligation to take reasonable steps to de-identify personal information is spelt out in other guidance material. For example, the Privacy Commissioner’s Tax File Number Guidelines issued under s 17 of the Privacy Act require reasonable steps to securely destroy or permanently de-identify tax file number information where it is no longer required by law to be retained, or no longer necessary for a purpose under taxation law, personal assistance law or superannuation law (including the administration of such law).[3]

From 12 March 2014 new de-identification obligations under Australian Privacy Principles 4 and 11 of the Privacy Act will also apply. More information about reforms to the Privacy Act is available at www.oaic.gov.au.

What is de-identification?

De-identification is a process by which a collection of data or information (for example, a dataset) is altered to remove or obscure personal identifiers and personal information (that is, information that would allow the identification of individuals who are the source or subject of the data or information).

De-identification can be a useful technique for protecting privacy and confidentiality when releasing information or data. Nevertheless, de-identification is not infallible; it may be possible in some circumstances to re-identify data or information by matching it with other datasets or information.[4] However, where de-identification is administered to a high standard, together with appropriate risk management strategies, the risk of re-identification can be minimised.

Why should you de-identify?

De-identifying information assets:

  • is required by the Privacy Act in some cases
  • enables information or data that includes personal information to be released in a de-identified form for use by others, such as researchers[5]
  • helps protect confidential information and data
  • enables agencies to be transparent
  • can lessen the risk that personal information will be compromised when an information asset is exposed to unauthorised access, use or distribution (that is, a data breach) – for example, where:
    • an intruder gains unauthorised access to information assets
    • an employee accesses information assets without authorisation
    • data or an information storage device is lost or stolen.

When to de-identify

Agencies should consider whether there is a need to de-identify information assets when:

  • collecting and storing personal information
  • publishing material that may contain personal information
  • sharing information with another agency or organisation, or
  • sharing information within different sections of the agency .

As a general rule, if an information asset does not need to include personal identifiers, it should be de-identified.

Personal information does not need to be de-identified if it is required to meet a business objective or function. For example, an agency that provides services to individuals may need to retain their personal information to deliver tailored services to them. By contrast, if customer information is provided to another section of the agency to assist in developing education or policy materials, and the identity of the customer is not relevant, that information should be de-identified.

In determining the necessary level of de-identification, agencies should consider:

  • what kind of information or data is contained in the information asset
  • who will have access to the information asset, and why
  • whether the information asset contains unique or uncommon characteristics (quasi-identifiers) that could enable re-identification
  • whether the information or data will be targeted for re-identification because of who or what it relates to
  • whether there is other information or data available that could be used to re-identify the de-identified data or information
  • what harm may result if the information or data is re-identified.

Further information on identification risk factors and methods for assessing identification risks is provided in the National Statistical Service’s Confidentiality Information Sheet 5 – Managing the risk of disclosure in the release of microdata.

In some cases, de-identifying an information asset may reduce the usefulness of the asset. Nevertheless, de-identification may be necessary to ensure that personal or confidential information is not disclosed.

How to de-identify

De-identification techniques should be carefully chosen, based on the outcomes of a risk assessment, to ensure that personal information is protected and that the information asset will still be useful for its intended purpose after it is de-identified.

Appropriate techniques could include:

  • Removing or modifying identifying features such as a person’s name, address and date of birth.
  • Removing or modifying quasi-identifiers (for example, gender, significant dates, profession, income). Carefully consider whether quasi-identifiers should be de-identified and whether some may need to be retained for the information or data to continue to be meaningful and usable.
  • Suppressing data, to alter statistically narrow, quasi-identifiable data values, or to associate data with broader categorisations[6] — for example, changing ‘age = 27’ to ‘age = 25–35’.
  • Combining data categories to re-cast data categories that contain small values that could assist in identification of individuals — for example, combining values for 18-24 year olds with values for 24-30 year olds into a single category of 18-30 year olds
  • Manufacturing ‘synthetic data’, which can be generated from original data and then substituted for it, while preserving the value of the original data.[7] For example, a synthetic dataset could be used for system testing a program that detects fraud. The synthetic data will be based on, and replicate, the patterns found in an authentic dataset of normal use and fraud, but need not contain any personal information — this allows systems to be tested with data that is realistic without compromising the privacy of individuals.

Consider other steps that can be taken to manage and minimise the risk of re-identification. These may include:

  • requiring the data or information receiver to sign a contract limiting the use and distribution of the information or data, and enforcing the terms of that contract
  • limiting the access to information or data by, for example, allowing other agencies or organisations to view the data rather than providing a copy, or running an analysis of the data and providing the result rather than the raw data.

Further advice and examples of different methods of de-identification and methods of restricting access to data is provided in the National Statistical Service Confidentiality Information Sheet 4 – How to confidentalise data: the basic principles and Confidentiality Information Sheet 5 – Managing the risk of disclosure in the release of microdata.

Assessing the risks of re-identification

Before releasing information or data, agencies should confirm whether de-identification has been successful:

  • Apply the ‘motivated intruder’ test — this test considers whether a reasonably competent motivated person with no specialist skills would be able to identify the data or information (the specific motivation of the intruder is not relevant). It assumes that the motivated intruder would have access to resources such as the internet and all public documents, and would make reasonable enquiries to gain more information.[8]
  • Look at re-identification ‘in the round’ — that is, assess whether any agency, organisation or member of the public could identify any individual from the data or information being released — either in itself or in combination with other available information or data.[9]

Depending on the outcome of the risk analysis and the de-identification process, information custodians may need to engage an expert to undertake an assessment of the information asset to ensure the risk of re-identification is low.

The risk of re-identification may shift as technologies develop and a greater amount of data and information is published or obtained by an agency or organisation. Agencies and organisations should regularly re-assess the risk of re-identification and, if necessary, take further steps to minimise the risk. This may include:

  • assessing whether there is a need to have historical information or data published on a website
  • assessing whether a higher level of de-identification is required.

Further Resources

The National Statistical Service, representing all government agencies, and led by the Australian Bureau of Statistics, has produced a Confidentiality Information Series which is designed to explain, and provide advice on, a range of issues around confidentialising data, including basic techniques to confidentialise data and manage risks.

The Australian National Data Services provides materials on techniques for de-identification and ethical considerations, including:

The information provided in this agency resource is of a general nature. It is not a substitute for legal advice.

 

Footnotes

[1] Freedom of Information Act 1982 (Cth) s 3(3)

[2] Office of the Australian Information Commissioner, Principles on open public sector information, published May 2011, Office of the Australian Information Commissioner website www.oaic.gov.au/publications/agency_resources/principles_on_psi_short.html

[3] Office of the Australian Information Commissioner, Tax File Number Guidelines 2011, published December 2011, Comlaw website, www.comlaw.gov.au/Details/F2011L02748; and Privacy Fact Sheet 6 — The binding Tax File Number Guidelines 2011 and the protection of tax file number information, last modified March 2012, Office of the Australian Information Commissioner website, www.oaic.gov.au/publications/privacy_fact_sheets/privacy_fact_sheet6_TFN_guide_2011.html

[4] Anna Cavoukian and Khaled El Emam, Dispelling the myths surrounding de-identification: anonymization remains a strong tool for protecting privacy, published June 2011, Information and Privacy Commissioner Ontario website www.ipc.on.ca/images/Resources/anonymization.pdf, p 4 and Paul Ohm, ‘Broken promises of privacy: responding to the surprising failure of anonymization’ (2010) 57 University of California Los Angeles Law Review 1701, p 1744

[5] In Autism Aspergers Advocacy Australia and Department of Families, Housing, Community Services and Indigenous Affairs [2012] AICmr 28 the Australian Privacy Commissioner found that de-identification can be used to protect an individual’s privacy in response to a request under the Freedom of Information Act 1982 (Cth). The Commissioner has also released the Privacy Guidelines for the Medicare Benefits and Pharmaceutical Benefits Programs under s 135AA of the National Health Act 1953, which establish when identifiable Medicare Benefits or Pharmaceutical Benefits claims information can be disclosed for the purposes of medical research. The guidelines are available at www.privacy.gov.au/materials/types/guidelines/view/6576

[6] National Statistical Service Confidentiality Information Sheet 4: How to confidentialise data: the basic principles

[7] United Kingdom Information Commissioner’s Office 2012 Anonymisation: managing data protection risk code of practice, United Kingdom Information Commissioner’s Office, Wilmslow, Appendix 2, p 53

[8] United Kingdom Information Commissioner’s Office, p 22

[9] United Kingdom Information Commissioner’s Office, p 19