Skip to main content
Skip to secondary navigation
Australian Government - Office of the Australian Information Commissioner - Home

“One giant leap” or “Houston, we have a problem”? De-identification as a privacy enhancing tool

Does de-identification have the potential to protect people’s privacy in large public data sets, or is it a privacy ‘unicorn’?

Experts arguing for and against de-identification as a privacy-enhancing tool debated just that question at our GovInnovate workshop on 16 November in Canberra — and revealed there is some disagreement on what ‘properly de-identifying data’ means.

Dr Vanessa Teague, a computer scientist and Senior Lecturer at the University of Melbourne said that she believes it is a myth that there is any algorithm for de-identification that works.

Dr Khaled El Emam, a de-identification expert and Professor at the University of Ottowa, countered that there are models and risk metrics which do work, where an acceptable low level of risk is determined. He pointed to his own work on the Heritage Health Prize Claims Dataset and the statistical risk thresholds set by European and US health agencies, courts and regulators to illustrate standards for de-identification.

The panel debated what level of risk could be accepted, and used to determine whether publishing a large data set publicly was ‘safe’.

Dr Stephen Hardy, from the CSIRO’s Data 61, and Dr El Emam also introduced ‘potential harm’ as an important measure of re-identification risk. Not every instance of re-identification results in new information being released; for example, addresses and contact numbers can be easily found through White Pages. The panel explored how an assessment of the risk of harm to individuals would shape the de-identification strategies used on a particular data set.

Gemma Van Halderen, the General Manager of Strategy and Partnerships at the ABS, pointed out that the methods used to de-identify data would change over time, and that agencies responsible for datasets would have to ensure that the methods, technologies and capabilities of the agency keep up with re-identification risks.

She said that not making datasets publicly available would be a step back for Australia. Public data sets can be an immensely valuable resource for policy, planning, research and innovation.

This was also highlighted by an audience member, who stated that he didn’t want to see a remote fear of databases falling into the wrong hands hindering innovation, such as the development of health treatments. Another audience member suggested that ‘missed opportunity notifications’ could be introduced as a counterbalance to data breach notifications.

Privacy lawyer Anna Johnston said individuals were at risk if privacy falls through the gaps in publishing datasets for the public good. Individuals can be negatively impacted if their personal information is not safeguarded — which is why reaching a common understanding on de-identification standards is essential.

Australian Information and Privacy Commissioner, Timothy Pilgrim, concluded the seminar noting the significant task ahead for the OAIC in synthesising the complexities of this issue into practical guidance for Australian agencies and businesses. 

If you missed this workshop, you may want to join us at our upcoming conference Data + Privacy Asia Pacific on 12 July, 2017 in Sydney.

The full panel:

  • Dr Khaled El Emam, Founder and CEO, Privacy Analytics; Professor University of Ottawa
  • Dr Stephen Hardy, Group Leader, Data Platform Engineering, Data61, CSIRO
  • Paul McCarney, CEO and Co-Founder, Data Republic
  • Gemma Van Halderen, GM, Strategy and Partnerships Division, Australian Bureau of Statistics
  • Anna Johnston, Director, Salinger Privacy
  • Dr Vanessa Teague, The University of Melbourne
  • Ian Oppermann, CEO and Chief Data Scientist at NSW Data Analytics Centre
  • Timothy Pilgrim, Australian Privacy Commissioner and Australian Information Commissioner
  • Josh Taylor, Crikey to chair the panel.