De-identification of personal information can enable information to be shared or published without jeopardising personal privacy. This enables agencies to maximise the utility and value of government information assets while safeguarding privacy and confidentiality.
The Freedom of Information Act 1982 declares that government information is a national resource that is to be managed for public purposes. Agencies are urged to take a pro-disclosure approach by actively publishing data and information (known as information assets).
If an information asset includes personal information, an agency must also comply with the Australian Privacy Principles (APPs) in the Privacy Act 1988. The APPs regulate how agencies collect, use, disclose and store personal information. Importantly, APP 6 limits the disclosure of personal information.
De-identification is a tool that advances both transparency and privacy objectives.
This resource provides general advice about de-identification, to assist agencies in balancing those objectives in information management. Guidance is provided on when de-identification may be appropriate, how to choose appropriate de-identification techniques, and how to assess the risk of re-identification. This resource is not legally binding, and should be considered in the context of other requirements of the Privacy Act and other legislation.
General obligations under the Privacy Act and Archives Act
The APPs support de-identification of personal information in specified circumstances. For example, if an entity to which the Privacy Act applies no longer needs personal information for any purpose for which it was collected or may be used, the entity must take reasonable steps to destroy or de-identify the information (APP 11.2).
That obligation does not apply if the personal information is contained in a Commonwealth record. As a general rule, a Commonwealth record can only be destroyed or altered in accordance with s 24 of the Archives Act 1983. The grounds on which this may be done include ‘normal administrative practice’ and destruction or alteration in accordance with an arrangement approved by the National Archives of Australia (often titled a ‘Records Disposal Authority’). It is also an offence under s 26 of the Archives Act to alter a Commonwealth record that is over 15 years old.
Before an agency de-identifies a record containing personal information it should consider these obligations under the Archives Act. An agency may be required to retain an original, unaltered copy of the record. This does not, however, prevent agencies from creating a de-identified version of the record to share or distribute in cases where privacy concerns would prevent disclosure of the original record.
Agencies should also be aware of the obligation imposed by the Tax File Number Guidelines issued under s 17 of the Privacy Act. The Guidelines require an agency to take reasonable steps to securely destroy or permanently de-identify tax file number information that is no longer required by law to be retained, or is no longer necessary for a purpose under taxation law, personal assistance law or superannuation law.
What is de-identification?
Personal information is ‘de-identified’ if the information is no longer about an identifiable individual or an individual who is reasonably identifiable.
De-identification involves removing or altering information that identifies an individual or is reasonably likely to do so. Generally, de-identification includes two steps:
- removing personal identifiers, such as an individual’s name, address, date of birth or other identifying information, and
- removing or altering other information that may allow an individual to be identified, for example, because of a rare characteristic of the individual, or a combination of unique or remarkable characteristics that enable identification.
De-identification can be effective in preventing re-identification of an individual, but may not remove that risk altogether. There may, for example, be a possibility that another dataset or other information could be matched with the de-identified information. The risk of re-identification must be actively assessed and managed to mitigate this risk. This should occur both before an information asset is de-identified and after disclosure of a de-identified asset (see ‘Assessing the risk of re-identification’ below).
The National Statistical Services’ (NSS’s) Confidentiality Information Series provides guidance about a process called ‘confidentialisation’. Confidentialisation involves both de-identifying data and then taking the additional step of assessing and managing the risk of indirect identification occurring in the de-identified dataset.
In practice, ensuring that an information asset has been ‘de-identified’ for the purpose of s 6 of the Privacy Act will require agencies to adopt a risk assessment approach similar to the one involved in confidentialisation.
Why should personal information be de-identified?
- is required by the Privacy Act in specified circumstances
- enables an information asset that includes personal information to be disclosed in a de-identified form that makes it available for use by researchers and others; this can:
- assist public participation in government processes
- inform policy and program development and design
- drive innovation and economic growth by creating new opportunities for commercial enterprise
- helps protect confidential information and data
- can lessen the risk that personal information will be compromised when an information asset is exposed to unauthorised access, use or distribution (that is, a data breach) — for example, where:
- an intruder gains unauthorised access to agency information
- an employee internally accesses information without authorisation
- data or an information storage device is lost or stolen.
When to de-identify
Personal information does not need to be de-identified if it is required for a purpose permitted under the APPs. An example is agency case files that are used to deliver tailored services to individuals and contain personal information about those individuals.
De-identification may need to be considered if:
- particular sections of the agency do not require access to the full information asset (for example, a team developing education or policy materials may only require access to a de-identified version of the information asset)
- the information asset is being shared with another entity
- the information asset is being published.
In de-identifying an information asset an agency may need to retain an original, unaltered copy of the record, to comply with record-keeping obligations (see ‘General obligations under the Privacy Act and Archives Act’ above).
In determining the necessary level of de-identification, an agency should consider:
- what kind of information or data is contained in the information asset
- who will have access to the information asset, and why
- whether the information asset contains unique or uncommon characteristics (quasi-identifiers) that could enable re-identification
- whether the information or data will be targeted for re-identification because of who or what it relates to
- whether there is other information or data available that could be used to re-identify the de-identified data or information
- what harm may result if the information or data is re-identified.
Further information on identification risk factors and methods for assessing identification risks is provided in the National Statistical Service’s Confidentiality Information Sheet 5 – Managing the risk of disclosure in the release of microdata.
In some cases, de-identifying an information asset may reduce the utility of the asset. Nevertheless, de-identification may be necessary to minimise the risk of wrongly disclosing personal or confidential information.
How to de-identify
De-identification techniques should be carefully chosen, based on a risk assessment, to ensure that personal information is protected and that a de-identified information asset will still be useful for its intended purpose.
Removing or modifying personal identifiers such as a person’s name, address and date of birth is an essential component of de-identification.
Other de-identification techniques that may be considered include:
- Removing or modifying quasi-identifiers (for example, significant dates, profession, income) that are unique to an individual, or in combination with other information are reasonably likely to identify an individual.
- Combining information or data that is likely to enable identification of an individual into categories. For example, age may be combined and expressed in ranges (25-35 years) rather than single years (27, 28 etc). Extreme values above an upper limit or below a lower limit may be placed in an open-ended range, such as an age value of ‘less than 15 years’ or ‘more than 80 years’.
- Altering identifiable information in a small way such that the aggregate information or data is not significantly affected — a tolerable error — but the original values cannot be known with certainty. A variety of methods for rounding data are described in the NSS's Confidentiality Information Sheet 4 — How to confidentialise data: the basic principles.
- Swapping identifying information for one person with the information for another person with similar characteristics to hide the uniqueness of some information. For example, a person from a particular town in Australia may speak a language that is unique in that town. Information about that individual’s spoken language could be swapped with the spoken language information for another individual with otherwise similar characteristics (based on age, gender, income or other characteristics as appropriate) in an area where the language is more commonly spoken.
- Manufacturing ‘synthetic data’, which can be generated from original data and then substituted for it, while preserving some of the patterns contained in the original data. For example, a synthetic dataset could be used to test a program that detects fraud. The synthetic data could be built to enable a test environment that replicates patterns from an authentic dataset of normal use and fraud but does not allow individuals from the original data to be identified. This allows systems to be tested with data that is realistic but poses less risk of re-identification.
- Suppressing data, which involves not releasing particular information that may enable re-identification. Data suppression may impair the utility of an information asset. An agency may wish to consider first whether other de-identification techniques will adequately reduce the risk of re-identification.
Other steps that can be taken to manage and minimise the risk of re-identification include:
- Requiring the data or information receiver to sign a contract limiting the use and distribution of the information or data, and enforcing the terms of that contract. For example, the contract could include an assurance that the receiver will not attempt to re-identify the information, and will destroy any information re-identified unintentionally.
- Limiting access to a de-identified information asset, for example, allowing other agencies or researchers to view the data rather than providing a copy.
- Assisting analysis of data rather than providing access to it, for example, running an analysis of the data and providing the result rather than the raw data.
Further advice and examples of different methods of de-identification and methods of restricting access to data are provided in the NSS’s Confidentiality Information Sheet 4 – How to confidentialise data: the basic principles and Confidentiality Information Sheet 5 – Managing the risk of disclosure in the release of microdata.
Assessing the risks of re-identification
Before releasing a de-identified information asset, an agency should confirm whether de-identification has been successful. Potential risk assessment processes may include:
- Applying the ‘motivated intruder’ test — this test considers whether a reasonably competent motivated person with no specialist skills would be able to identify the data or information (the specific motivation of the intruder is not relevant). It assumes that the motivated intruder would have access to resources such as the internet and all public documents, and would make reasonable enquiries to gain more information.
- Looking at re-identification ‘in the round’ — that is, assessing whether any entity or member of the public could identify any individual from the data or information being disclosed, either in itself or in combination with other available information or data.
The risk of re-identification will depend on the nature of the information asset, the de-identification techniques used and the context of the disclosure. Relevant factors to consider when determining whether an information asset has been effectively de-identified could include the cost, difficulty, practicality and likelihood of re-identification.
Depending on the outcome of the risk analysis and the de-identification process, information custodians may need to engage an expert to undertake a statistical or scientific assessment of the information asset to ensure the risk of re-identification is low.
The risk of re-identification may shift as technologies develop and a greater amount of information assets are published or obtained by an agency or organisation. An agency should regularly re-assess the risk of re-identification and, if necessary, take further steps to minimise the risk. This may include:
- assessing whether a higher level of de-identification is required
- assessing whether the release of additional de-identified information assets could potentially facilitate re-identification of already-published information.
The NSS, representing all government agencies, and led by the Australian Bureau of Statistics, has produced a Confidentiality Information Series which is designed to explain, and provide advice on, a range of issues around confidentialising data, including basic techniques to confidentialise data and manage risks.
The Australian National Data Service also provides materials on techniques for de-identification and ethical considerations, including:
The information provided in this resource is of a general nature. It is not a substitute for legal advice.
 ‘Personal information’ is defined in s 6(1) of the Privacy Act, as information or an opinion about an identified individual or an individual who is reasonably identifiable. For more information, see www.oaic.gov.au/privacy/what-is-covered-by-privacy.
 Other APPs that refer to de-identification are APP 4.3 (unsolicited personal information) and APP 6.4 (disclosure of personal information). See also APP 13.1 on correction of personal information. Further information about the APPs is available in the OAIC’s Australian Privacy Principle Guidelines, available at www.oaic.gov.au.
 Office of the Australian Information Commissioner, Tax File Number Guidelines 2011, published December 2011, Comlaw website, www.comlaw.gov.au/Details/F2011L02748; and Privacy Fact Sheet 6 — The binding Tax File Number Guidelines 2011 and the protection of tax file number information, last modified March 2012, Office of the Australian Information Commissioner website, www.oaic.gov.au/publications/privacy_fact_sheets/privacy_fact_sheet6_TFN_guide_2011.html.
 This definition of ‘de-identified’ is given in s 6(1) of Privacy Act.
 Eg, Anna Cavoukian and Khaled El Emam, Dispelling the myths surrounding de-identification: anonymisation remains a strong tool for protecting privacy (June 2011), Information and Privacy Commissioner Ontario website, p 4, www.ipc.on.ca/images/Resources/anonymization.pdf; and Paul Ohm, ‘Broken promises of privacy: responding to the surprising failure of anonymisation’ (2010) 57 University of California Los Angeles Law Review 1701, p 1744.
 National Statistical Service, Confidentiality Information Series, National Statistical Service website, www.nss.gov.au/nss/home.NSF/pages/Confidentiality+Information+Sheets.
 In Autism Aspergers Advocacy Australia and Department of Families, Housing, Community Services and Indigenous Affairs  AICmr 28 (www.oaic.gov.au/freedom-of-information/applying-the-foi-act/ic-review-decisions/2012-aicmr-28) the Australian Privacy Commissioner found that de-identification can be used to protect an individual’s privacy in response to a request under the Freedom of Information Act 1982 (Cth). The Commissioner has also released the Privacy Guidelines for the Medicare Benefits and Pharmaceutical Benefits Programs (www.oaic.gov.au/privacy/applying-privacy-law/legally-binding-privacy-guidelines-and-rules/privacy-guidelines-for-the-medicare-benefits-and-pharmaceutical-benefits-programs-issued-march-2008-effective-from-1-july-2008) under s 135AA of the National Health Act 1953, which establish when identifiable Medicare Benefits or Pharmaceutical Benefits claims information can be disclosed for the purposes of medical research.
 The OAIC Principles on open public sector information (www.oaic.gov.au/publications/agency_resources/principles_on_psi_short.html) provide guidance about maximising the value of public sector information.
 The OAIC Guide to information security (www.oaic.gov.au/privacy/privacy-resources/privacy-guides/guide-to-information-security) provides guidance on the reasonable steps entities are required to take under the Privacy Act to protect the personal information they hold.
 United Kingdom Information Commissioner’s Office, Anonymisation: managing data protection risk code of practice, published 2012, United Kingdom Information Commissioner’s Office, Wilmslow, Appendix 2, p 53, www.ico.org.uk/for_organisations/data_protection/topic_guides/anonymisation.
 Available at www.nss.gov.au/nss/home.nsf/pages/Confidentiality+-+How+to+confidentialise+data:+the+basic+principles and www.nss.gov.au/nss/home.nsf/pages/Confidentiality+-+Managing+the+risk+of+disclosure+in+the+release+of+microdata.
 United Kingdom Information Commissioner’s Office, p 22.
 United Kingdom Information Commissioner’s Office, p 19.
 National Statistical Service, Confidentiality Information Series, National Statistical Service website, www.nss.gov.au/nss/home.NSF/pages/Confidentiality+Information+Sheets.