24 August 2023
The Office of the Australian Information Commissioner (OAIC) and 11 of its international data protection and privacy counterparts have released a joint statement to address the issue of data scraping on social media platforms and other publicly accessible sites.
Data scraping technologies, which are being increasingly used to collect and process vast amounts of individuals’ personal information from the internet, raise significant privacy concerns as these technologies can be exploited for purposes including monetisation through reselling data to third-party websites, including to malicious actors, private analysis or intelligence gathering.
In recent years, the OAIC has seen increased reports of mass data scraping from social media applications and other websites that host publicly accessible personal information. This type of activity was evidenced in a joint investigation by the OAIC and the UK’s Information Commissioner’s Office into the personal information handling practices of Clearview AI, Inc. Following the joint investigation, Australian Information Commissioner and Privacy Commissioner made a determination finding that Clearview AI, Inc. had breached Australians’ privacy.
In Australia, the Privacy Act 1988 sets outs a number of obligations for organisations. In the context of data scraping:
- An organisation must take reasonable steps to protect personal information it holds from misuse, interference and loss, and from unauthorised access, modification or disclosure – including as a result of unlawful data scraping.
- Under the Notifiable Data Breaches scheme, when a data breach involving information collected through data scraping technologies is likely to result in serious harm to an affected individual, the organisation must notify affected individuals and the OAIC.
The OAIC and its international colleagues look forward to receiving feedback from various companies running social media platforms over the coming weeks about how those organisations currently comply, or intend to comply, with the expectations and principles detailed in the joint statement.
Finally, the signatories remind individuals that they can lose control over the information they make public online, and that they should consider carefully what personal information they share.
Joint statement on data scraping and the protection of privacy
- Personal information that is publicly accessible is still subject to data protection and privacy laws in most jurisdictions.
- Social media companies and the operators of websites that host publicly accessible personal data have obligations under data protection and privacy laws to protect personal information on their platforms from unlawful data scraping.
- Mass data scraping incidents that harvest personal information can constitute reportable data breaches in many jurisdictions.
- Individuals can also take steps to protect their personal information from data scraping, and social media companies have a role to play in enabling users to engage with their services in a privacy protective manner.
- Data scraping generally involves the automated extraction of data from the web. Data protection authorities are seeing increasing incidents involving data scraping, particularly from social media and other websites that host publicly accessible data.
- The capacity of data scraping technologies to collect and process vast amounts of individuals’ personal information from the internet raises significant privacy concerns, even when the information being scraped is publicly accessible.
- In most jurisdictions, personal information that is “publicly available”, “publicly accessible” or “of a public nature” on the internet, is subject to data protection and privacy laws. Individuals and companies that scrape such personal information are therefore responsible for ensuring that they comply with these and other applicable laws. However, social media companies and the operators of other websites that host publicly accessible personal information (SMCs and other websites) also have data protection obligations with respect to third-party scraping from their sites. These obligations will generally apply to personal information whether that information is publicly accessible or not. Mass data scraping of personal information can constitute a reportable data breach in many jurisdictions.
- Scraped personal information can be exploited for various purposes, such as monetisation through re-use on third-party websites, sale to malicious actors, or private analysis or intelligence gathering, resulting in serious risks to individuals as explained further below.
- SMCs and other websites should carefully consider the legality of different types of data scraping in the jurisdictions applicable to them and implement measures to protect against unlawful data scraping.
- The aim of this joint statement is to:
- outline the key privacy risks associated with data scraping
- set out how SMCs and other websites should protect individuals’ personal information from unlawful data scraping to meet regulatory expectations
- set out steps that individuals can take to minimise the privacy risks from scraping.
- We have published this joint statement for the benefit of SMCs and other websites, as well as for individuals who use and post personal information on these websites. It has also been sent directly to Alphabet Inc. (YouTube), ByteDance Ltd (TikTok), Meta Platforms, Inc. (Instagram, Facebook and Threads), Microsoft Corporation (LinkedIn), Sina Corp (Weibo), and X Corp. (X, previously Twitter).
- The practices outlined in this joint statement reflect common global data protection principles and practices, and are designed to help protect against data scraping of personal information and mitigate against its privacy impacts. While the expectations are phrased as recommendations (using the term “should”), many of them are explicit statutory requirements in particular jurisdictions or may be interpreted as such by courts and data protection authorities.
- We recognise that some SMCs have implemented controls to address data scraping of publicly accessible personal information, including for example, through court action or governance initiatives. The principles and expectations included in this open letter are informed by, and build on, that activity.
- In recent years, many data protection authorities have seen increased reports of mass data scraping from SMCs and other websites. The reports raise a number of privacy concerns, including the use of scraped data for:
- Targeted cyberattacks – For example, scraped identity and contact information posted on ‘hacking forums’ may be used by malicious actors in targeted social engineering or phishing attacks.
- Identity fraud – Scraped data may be used to submit fraudulent loan or credit card applications, or to impersonate the individual by creating fake social media accounts.
- Monitoring, profiling and surveilling individuals – Scraped data may be used to populate facial recognition databases and provide unauthorised access to authorities.
- Unauthorised political or intelligence gathering purposes – Scraped data may be used by foreign governments or intelligence agencies for unauthorised purposes.
- Unwanted direct marketing or spam – Scraped data may include contact information that can be used to send bulk unsolicited marketing messages.
- More broadly, individuals lose control of their personal information when it is scraped without their knowledge and against their expectations. For example, data scrapers may aggregate and combine scraped data from one site with other personal information, and use it for unexpected purposes. This can undermine individuals’ trust in the SMC or other websites, with potentially detrimental impacts on the digital economy. Moreover, even if individuals decide to delete their information from a social media account, data scrapers will likely continue using and sharing information they have already scraped, limiting individuals’ control over their online presence and reputation.
SMCs and other websites should protect personal information from unlawful data scraping
- SMCs and other websites are responsible for protecting individuals’ personal information from unlawful data scraping.
- Techniques for scraping and extracting value from publicly accessible data are constantly emerging and evolving. Data security is a dynamic responsibility and vigilance is paramount.
- As no one safeguard will adequately protect against all potential privacy harms associated with data scraping, SMCs and other websites should implement multi-layered technical and procedural controls to mitigate the risks. A combination of these controls should be used that is proportionate to the sensitivity of the information, and may include:
- Designating a team and/or specific roles within the organisation to identify and implement controls to protect against, monitor for, and respond to scraping activities.
- "Rate limiting" the number of visits per hour or day by one account to other account profiles, and limiting access if unusual activity is detected.
- Monitoring how quickly and aggressively a new account starts looking for other users. If abnormally high activity is detected, this could be indicative of unacceptable usage.
- Taking steps to detect scrapers by identifying patterns in "bot" activity. For example, a group of suspicious IP addresses can be detected by monitoring from where a platform is being accessed by using the same credentials from multiple locations. This would be suspicious where these accesses are occurring within a short period of time.
- Taking steps to detect bots, such as by using CAPTCHAs, and blocking the IP address where data scraping activity is identified.
- Where data scraping is suspected and/or confirmed, taking appropriate legal action such as the sending of "cease and desist" letters, requiring the deletion of scraped information, obtaining confirmation of the deletion, and other legal action to enforce terms and conditions prohibiting data scraping
- In jurisdictions where the data scraping may constitute a data breach, notifying affected individuals and privacy regulators as required.
- In addition to security controls like those mentioned above, SMCs and other websites also have a role to play in enabling users to engage with their services in a privacy protective manner. To this end, SMCs and other websites should proactively support their users so that they can make informed decisions about how they use the platform and what personal information they share. This should also involve increasing user awareness and understanding of the privacy settings they can utilise, as discussed further below.
- If any safeguards implemented to protect against data scraping involve processing of personal information, SMCs and other websites should ensure that this processing complies with any applicable data protection or privacy law requirements. As a matter of good practice and to ensure transparency, these entities should also inform their users of the steps they have taken to protect against data scraping.
- Given the dynamic nature of data scraping threats, SMCs and other websites should continuously monitor for, and respond with agility to, new security risks and threats from malicious or other unauthorised actors to their platform. Controls should be routinely stress-tested and updated to ensure that they remain effective and keep pace with changing technologies. SMCs and other websites should also collect and analyse metrics on scraping incidents, to inform and identify areas of improvement in their security control framework.
Steps that individuals can take to minimise the privacy risks from data scraping
- Although the security controls outlined above may mitigate the risks associated with data scraping, no safeguards are 100% effective and individuals should therefore be mindful that the personal information they share online may be at risk.
- While this joint statement focuses on the measures that SMCs and other websites can implement to mitigate against the risk of data scraping, individuals can also take steps to empower themselves and better protect their personal information, including:
- Think about the amount and kinds of information shared – Individuals should consider limiting the information that they post online. In particular, individuals should be cautious to limit the sharing of sensitive information and consider if sharing certain information (such as personal details, account numbers or identification numbers) may put them at risk of reputational damage, discrimination, harassment, identity fraud or theft.
- Understand and manage privacy settings – While individual-user privacy settings can only go so far in providing privacy protection, they can and should help individuals increase the control they have over how their personal information is shared online. Accordingly, website users should consider using these settings to limit the information that they make publicly accessible.
- Ultimately, we encourage individuals to think long term. How would a person feel years later, about the information that they share today? While SMCs and other websites may offer tools to delete or hide information, that same information can live forever on the web if it has been indexed or scraped, and onward shared.
- If individuals are concerned that their data may have been scraped unlawfully, or improperly, then they can contact the SMC or website, and if dissatisfied with the response, they can file a complaint with their relevant data protection authority. They may also wish to review their privacy settings and the information that they are sharing online, to make changes and remove personal information as needed.
- The expectations in this joint statement set out key areas for SMCs and other websites to focus on with a view to ensuring that they protect personal information accessible on their websites from data scraping, particularly so that they are compliant with data protection and privacy laws around the world. Protecting against data scraping will also support SMCs and other websites in building the trust and confidence of their userbase.
- SMCs and other websites can further protect their users’ information and reinforce user trust by actively informing their users of the steps they can take to protect their personal information, like those outlined above.
- We welcome any feedback from SMCs by 1 month from the issuance of this statement demonstrating how they comply with the expectations outlined in this joint statement. Any responses will be shared amongst signatories and may be published.
This statement is endorsed by the following members of the GPA’s International Enforcement Cooperation Working Group (IEWG).
Office of the Australian Information Commissioner, Australia
Office of the Privacy Commissioner of Canada, Canada
Deputy Commissioner – Regulatory Supervision
Information Commissioner’s Office, United Kingdom
Ada CHUNG Lai-ling
Office of the Privacy Commissioner for Personal Data, Hong Kong, China
Federal Data Protection and Information Commissioner, Switzerland
Head of International Section
Office of the Privacy Commissioner, New Zealand
Cielo Angela Peña Rodriguez
Deputy Superintendent for the Protection of Personal Data
Superintendencia de Industria y Comercio, Colombia
Jersey Office of the Information Commissioner, Jersey
CNDP (Commission Nationale de contrôle de la protection des Données à caractère Personnel), Morocco
Beatriz de Anchorena
AAIP (Agency for Access to Public Information), Argentina
Josefina Román Vergara
National Institute for Transparency, Access to Information and Personal Data Protection (INAI), Mexico
 A ‘bot’ – a computer program that performs automatic repetitive tasks, or a computer application designed to automate certain tasks (such as gathering information online), especially one designed to perform a malicious action. Merriam-Webster Dictionary
 A CAPTCHA is a Completely Automated Public Turing test to tell Computers and Humans Apart. This is a program that tests whether a user is a human or an automated program (e.g. a bot) (PC Mag, 'Definition of CAPTCHA'). Some examples of CAPTCHAs are programs that require a user to: interpret text that is distorted, or look at a set of similar pictures and identify which of these contain a specific object.