Using social media data responsibly in research

Here at Westminster, research using social media data where the research concerns users is considered as if working with human participants in real life. Your research design is therefore subject to the same ethical considerations and ethical review as if you were collecting observational data in the field. 

In this blog post, we’ve compiled some introductory resources to help Westminster researchers navigate the ethics and methods of using social media data in research. 

What is social media research? 

Social media research falls under the umbrella of internet-mediated research, defined by The British Psychological Society as ‘any research involving the remote acquisition of data from or about human participants using the Internet and its associated technologies’. Research using social media data can be observational, in cases where existing public social media datasets are extracted or sampled and then analysed, or interventional, in cases where you may be using an app to monitor the effect of a treatment on a patient.  

What is social media data? 

Social media data can be defined as data from any platform or application that collects data from its users. Twitter, Facebook, LinkedIn, and Instagram are primary examples, but social media data are also data that can be extracted or mined from interactive sites, platforms, and apps such as Google Maps, Peloton, Fitbit, Spotify, Netflix, Discord, and Deliveroo. 

How is social media research conducted? 

Small datasets for observational analysis can be extracted or mined from social media simply by browsing the site or app. Many platforms have also developed free to use tools (Application Programming Interfaces, APIs) which can be tailored to retrieve large datasets for analysis via, for example, specified keywords. ‘Netnography’ describes internet-based ethnographic research on internet communities, such as those of Mumsnet or Reddit. 

See our list of social media tools for researchers (staff login), including tools for extracting and visualising public social media data, social listening and trend tracking software, and links to further resources. 

How can I manage social media research ethically? 

Use an ethical framework 

If you are planning to conduct social media research, familiarise yourself with The British Psychological Society Ethics Guidelines for Internet-Mediated Research (updated June 2021) and any relevant guidance from your professional body. 

Read the Terms of Service 

Make sure you have also read the Terms of Service or User Agreement for the platform you are working with to understand both what users have consented to share, and the circumstances and purposes for which the platform allows that data to be mined and analysed. 

Gain ethical approval 

You must submit an ethics application for your research project via the VRE in advance of undertaking any work with potential ethical implications, i.e., a risk of potential harm. Even where there is no perceived risk of potential harm you should still complete the research ethics self-assessment form (go to My ethics in the VRE and complete Form Part A). 

Minimise risk

Potential risks of harm to your participants include the lack of valid consent and the potential identification of participants from your published findings; users may be retraced, extorted, ‘outed’, or politically targeted, especially if they are engaged in criminal activity, activism, or are using a social media platform that has been banned in their home country. The age of participants (social media users) is also often difficult to verify. You should also consider personal and reputational risks as well as the risk of harm to your participants.

You must consider the balance between using publicly available social media posts and protecting a user’s right to privacy. While many social media posts are in the public domain, they are not necessarily viewed as such by those who author them.  

In your published research, avoid quoting directly from social media posts and ensure that you remove or obscure any potentially identifiable, sensitive, or disclosive data (such as username, location, or any identifying information about ethnicity, etc.) via anonymisation, deidentification, or aggregation. You should also include a statement on how you protected user privacy. 

Are my findings valid? 

Be mindful that while your participants (social media users) may be cognisant of the fact that their posts are in the public domain, it does not necessarily follow that they would permit their posts to be used for research purposes, or that they would voice these same views as part of a research study. You should also account for the ways in which social media discourse is influenced by bots, censorship, and culturally specific rhetorical strategies (sarcasm, puns, etc.) that can get lost in translation and skew your data. Familiarise yourself with the specific statistical methods developed for big data, and be aware of sample bias: different researchers, different APIs, and different platforms will retrieve different datasets which may have qualitative or statistical significance for your findings. 

Pay attention to examples and descriptions of best practice as you conduct the literature review in preparation for your own study. 

What should I do if I am asked to publish or share my dataset? 

It is increasingly common for publishers and funders to request or mandate that researchers publish or share the research data that underpin their research findings. However, you will need to consider how to balance open data mandates with social media users’ rights to privacy, and account for the fact that some users may delete posts or change their privacy settings in between the stages of data collection and data publication.  

One strategy is to describe the exact procedure or protocol you used to extract the dataset (including any APIs you used, inclusion/exclusion criteria, keywords/ search terms, etc.). Researchers who wish to reuse or replicate your rich dataset can then follow the protocol. They will not retrieve any posts that have subsequently been deleted by a user, ensuring that any changes to privacy status in the live data are preserved.  

You must never publish or share personal data such as usernames or email addresses. 

Further links and resources: 

The British Psychological Society Ethics Guidelines for Internet-Mediated Research (updated June 2021). 

The University of Oxford Best Practice Guidance for Internet-Mediated Research (2021). 

Leeds University guidance on ‘The Ethics of Online Research’ (2018). 

Lancaster University basic Guidelines for the responsible use of social media data in research

For some methodological considerations when working with social media data, watch this presentation on using social media data responsibly by Dr Aditya Ranganath of the University of Colorado Boulder’s Center for Research Data and Digital Scholarship. 

University of Westminster list of social media tools for researchers (staff login).

Doug Specht, Senior Lecturer and Director of Teaching and Learning in the School of Media and Communication at the University of Westminster, has also published guidance and step-by-step tutorials on Collecting, cleaning, and visualising Twitter data.

Contact us 

For support with ethics applications, contact your College Research Ethics Committee Chair

For support with managing your research data, contact the Research Data Management Officer at 

Image by Geralt made available under the Creative Commons CC0 1.0 Universal Public Domain Dedication.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.