Department of Health data leak: 2.9 million patients' sensitive medical details mistakenly exposed

The data could be used to re-identify patients and reveal sensitive information.

Extra funding for GP services — Highly sensitive medical data of millions of Australians were inadvertently leaked by the Department of Health Regis Duvignau/Reuters

Australia's Department of Health has inadvertently exposed confidential and highly sensitive health records of about 2.9 million citizens in a major error that potentially reveals what medication they are on, pregnancy terminations, surgeries and medical health treatment. The health data from the Medicare Benefits Scheme and Pharmaceutical Benefits Scheme was released to the public in August 2016.

The dataset, which was supposed to be anonymised, included the de-identified medical billing records of millions of Australians from 1984 to 2014.

However, researchers at the University of Melbourne discovered that the data could be traced back to the individual – without decryption – using known information about the person, such as medical procedures and year of birth.

"We found that patients can be re-identified, without decryption, through a process of linking the unencrypted parts of the record with known information about the individual, such as medical procedures and year of birth," Dr Chris Culnane, who conducted the study with Dr Benjamin Rubinstein and Dr Vanessa Teague, said.

"This shows the surprising ease with which de-identification can fail, highlighting the risky balance between data sharing and privacy."

The data was then removed by the Health Department a month later after researchers warned the government that practitioner details could be used to identify patients.

Researchers were able to identify unique patient records, matching the online records of seven notable Australians, including those of three former or current MPs and an Australian Football League (AFL) footballer.

Dr Rubinstein noted that although a unique match may not always be accurate, they could narrow it down by cross-referencing other data.

"Because only 10% of Australians are included in the sample data, there can be a coincidental resemblance to someone who isn't included," Dr Rubinstein said. "We can improve confidence by cross-referencing with a second dataset of population-wide billing frequencies. We can also examine uniqueness according to the characteristics of commercial datasets we know of, such as bank billing data."

Dr Teague added that releasing de-identified records such as health, tax, census or Centrelink data to the public "is bound to fail" since it tries to accomplish two distinct but inconsistent goals – "protection of individual privacy and publication of detailed individual records".

"We need a much more controlled release in a secure research environment, as well as the ability to provide patients greater control and visibility over their data," Dr Teague said. "Legislating against re-identification will hide, not solve, mathematical problems, and have a chilling effect on both scientific research and wider public discourse."

The Health Department said no one has been identified despite the breach but added that it is taking the matter "very seriously" and has referred the incident to the Privacy Commissioner.

"The project was halted and remains halted, and the dataset was removed immediately," a spokesperson told News.com.au. "This matter dates back to 2016 and since then, the Australian Government has taken further steps to protect and manage data. The Department is working with the University of Melbourne and has already acted to improve its processes.

"The Department has not been aware of anyone being identified."

Senator Jordon Steele-John, the Greens' digital rights spokesperson, said the incident "can effectively be viewed as a data breach on the grandest scale".

"Legislating against misuse of this kind of data will not stop it occurring, especially when it is this easy to re-identify individual's records," Steele-John said, as per BuzzFeed News. "What are the implications for other publicly released data sets that are supposedly 'de-identified' and secure?"

Australia