Category Archives: Social Media Archives

Algorithmic Archive Project: Use Cases (3/3)

The Algorithmic Archive project is a one year project funded by the Mellon Foundation. As part of the first Work Package, we explored how researchers from different disciplines use social media data to answer various research questions.

This post is the third in a three-part series presenting use cases drawn from research conducted as part of the Algorithmic Archive project.

We would like to thank the researchers who generously shared insights from their work.


Use Case – Study on the trustworthiness of social media visual content among young adults (TRAVIS project)[1]

Research questions and aim(s):

Trust And Visuality: Everyday digital practices (TRAVIS) is an ESRC project which has received funding from the European Union’s Horizon 2020 Research and Innovation Programme. This research project that looks at how young adults experience, build and express trust in news and social media images related to wellbeing and health. It explores how and why people trust some visuals over others and how content creators establish trustworthiness through visual content. The TRAVIS project involves cross-national collaboration of multiple research teams located at different universities in UK and Europe. This includes the University of Oxford, in particular the Oxford team is based School of Geography and the Environment.

Social media data used:

The project included data collected indirectly from platforms including Facebook, Instagram, TikTok and YouTube (see below).

Tools and methods adopted:

Data collection from social media consisted of screenshots taken from the devices of interviewed young adults, as the TRAVIS project investigates the meaning of social media posts (visual content) via interviews with young adult users. The datasets generated from this method of collection counts around 400 screenshots, stored on an institutional cloud drive, which is accessible by the whole team.


[1] Further information about the TRAVIS project are available here: https://www.tlu.ee/en/bfm/researchmedit/trust-and-visuality-everyday-digital-practices-travis

Algorithmic Archive Project: Use Cases (2/3)

The Algorithmic Archive project is a one year project funded by the Mellon Foundation. As part of the first Work Package, we explored how researchers from different disciplines use social media data to answer various research questions.

This post is the second in a three-part series presenting use cases drawn from research conducted as part of the Algorithmic Archive project.

We would like to thank the researchers who generously shared insights from their work.


Use Case – Exploring Algorithmic Mediation and Recommendation Systems on YouTube [1]

Research questions and aim(s):

The study sought to investigate how the YouTube platform operates, focusing on algorithmic activity and the strategies employed by both human and automated (robot) actors within federal and regional elections. The aim was to understand the impact that this system of mediation has on society and to demystify preconceptions of ideologically neutral technologies in highly disputed political events. The research focuses on two case studies: 1) the 2018 Ontario (Canada) election and 2) the 2018 Brazilian Federal Election. The data collection was carried out during the campaigning periods, between May and June in Ontario, and between August and October 2018 in Brazil.

Social media data used:

The research focussed on the sole YouTube platform. Specifically, the researchers collected information about recommended videos starting from specific keywords related to the election campaign.

Tools and methods adopted:

The data collection was carried out using a Python script developed by the Algo Transparency project. The script automates YouTube search operations based on specified keywords (e.g., the names of the candidates), allowing the researcher to gather video-related data and the relative ranking position displayed to the user. Once the keywords were defined, the tool retrieved links for the top four results for each keyword and then examined the recommendation section. This process was repeated four times, each time collecting recommended videos, simulating a user interacting with algorithmic suggestions.

Data collected was stored on personal devices and the institutional cloud, and can be visualized at the following links:


[1] Reis, R., Zanetti, D., & Frizzera, L. (2020). A conveniência dos algoritmos: o papel do YouTube nas eleições brasileiras de 2018. Compolítica10(1), 35–58. https://doi.org/10.21878/compolitica.2020.10.1.333

Algorithmic Archive Project: Use Cases (1/3)

The Algorithmic Archive project is a one year project funded by the Mellon Foundation. As part of the first Work Package, we explored how researchers from different disciplines use social media data to answer various research questions.

This post is the first in a three-part series presenting use cases drawn from research conducted as part of the Algorithmic Archive project.

We would like to thank the researchers who generously shared insights from their work.


Use Case: Network/cluster analysis to investigate the construction and influence of information trustworthiness within social movements on Twitter [1]

Research questions and aim(s):

The researcher wanted to explore the construction and influence of information trustworthiness within social media movements in the context of the Hong Kong protests and the #BlackLivesMatter movements. Social media platforms offer a digital space for social movements to facilitate the diffusion of critical information and the formation of networks, coordinating protests and reach a wider audience.

Social media data used:

This study focused on Twitter as it was used evenly by both social movements, and the researcher already had an established presence on this platform. Also, at the time of data collection (2020-2021), access to Twitter data for academic research was still relatively open to researchers.

For the purpose of this study, the researcher examined the follow and followers’ relationship of top accounts counting millions of followers that had been selected as big information disseminators, including organisations, individuals or accounts serving a particular niche or purpose.

Data collection was conducted at a specific point in time in 2021. Social media data quantitative analysis (e.g. cluster analysis) was complemented with qualitative data collected via an online survey.

Tools and methods adopted:

The researcher requested and obtained access to the Twitter API. However, high-level coding skills were required to access the data, which the researcher did not have at that time due to their predominantly qualitative research background. To address this, the researcher found and used a Go script called Nucoll[2], which is freely available on GitHub and enabled the researcher to collect the required data. Nucoll is a command-line tool that, according to its developer, retrieves data from Twitter using keyword instructions, for which the developer provided example queries and brief explanations. For each social movement, the researcher selected three organisations: one large organisation, one activist group, and one additional account that was relevant to the movement. Once these accounts were selected, they were processed through the script to capture all following/follower relationships and combine them into a graph for each protest analysed. Further data visualisation and analysis — including clustering and network analysis — were conducted using Gephi.


[1] Charlotte Im, The Construction and Influence of Information Trustworthiness in Social Movements, Doctoral Thesis, University College London (UCL), 2024.

[2] https://github.com/jdevoo/nucoll

The Algorithmic Archive: a project overview

What is the Algorithmic Archive Project?

In 2024, the Algorithmic Archive Project has received funding from the Mellon Foundation to carry out scoping research that will ultimately support the Bodleian Libraries in the development of a lasting, interoperable infrastructure and sustainable strategies for archiving web-based data, including social media data and algorithms. The project is part of the broader Future Bodleian programme aiming to expand and evolve its centuries-old role by engaging with the digital domain.

Why archive social media data?

In the past two decades, social media platforms have become a central means of communication, enabling people from across the globe to engage in discussions that transcend geographical borders, reflect on contemporary events and contribute to collective memory. Given their profound impact on society, researchers across various disciplines increasingly rely on social media data to analyse social, economic, and political phenomena. However, social media data is inherently ephemeral, subject to continuous evolution driven by changes in platform leadership, economic gain, and shifting policies. For this reason, it is essential to preserve and provide reliable and sustainable access for the (re)use of such an important resource.

Steps towards the development of a social media and algorithmic data service.

The Algorithmic Archive project is articulated in four interconnected phases aimed to investigate the research, archiving, legal and technical landscape to inform the Bodleian Libraries’ future development of a social and algorithmic data service.

The image below offers a visual summary of the work packages that the Research Officers have been exploring over this one-year project.

In upcoming blog posts, we will present some of the results and highlight use cases drawn from research conducted with social media data.

Reporting from the Born-Digital Collections, Archives and Memory Conference 2025

Between 2-4 April 2025, I attended the very first edition of the Born-digital Collections, Archives and Memory conference, together with my colleague from the Algorithmic Archive Project, Pierre Marshall. The conference was co-organised by the School of Advanced Study at the University of London, the Endangered Material Knowledge Programme at The British Museum, The British Library and Aarhus University. This international event offered the unique opportunity to bring together academics and practitioners from diverse disciplines, career paths and backgrounds to explore the transformative impact of born-digital cultural heritage. The diverse range of research, methodologies, and practices presented in this year’s programme offered valuable insights and reflections, particularly relevant to the Algorithmic Archive project and its goal of developing sustainable, persistent approaches to preserving born-digital heritage created on the web, especially on social media platforms.

The inspiring opening keynote by Dorothy Berry, Digital Curator at the Smithsonian National Museum of African American History and Culture, highlighted the vital importance of preserving ephemeral and fragile forms of born-digital heritage (such as social media) —many of which have increasingly replaced traditional modes of memory-making, also drawing attention to the pressing need for a deeper understanding of what and how born-digital memory should be preserved. In particular, she stressed the need to record the “full context” in which born-digital records and materials were embedded before being collected and included in specific collections. However, she also highlighted the challenges many memory institutions face due to uneven resource distribution, an issue that may hinders both the development and long-term sustainability of innovative preservation efforts.

Given the richness of the BDCAM25 program, it is incredibly difficult to summarise the many takeaways from the three-day conference. Nevertheless, it is worth highlighting sessions such as the one exploring the history, socio-technical dynamics and research conducted on corpora from platforms such as Usenet; the important reflections stemmed from a study conducted by Rosario Rogel-Salazar and Alan Colín-Arce exploring the presence of feminist organisations in web archives; and the research conducted by Dr Andrea Stanton exploring Palestine and the concept of Palestinian heritage through the analysis of accounts and hashtags on Instagram. 

Particularly valuable insights came also from Dr Kieran Hegarty’s paper, which explored the challenges posed by the unpredictable and frequent changes to platform design and policies, underscoring how this significantly influence what is included in web archives and how the material is made available.

Beveridge Hall entrance, Senate House, University of London. Photo taken by B. Cannelli

Overall, the conference provided a valuable opportunity to learn about new research and to network with scholars and practitioners from around the globe. During lunch and coffee breaks, I had insightful conversations with several delegates about the challenges of preserving born-digital materials, particularly data generated on social media platforms. We exchanged ideas and reinforced the importance of developing shared practices to safeguard these resources. This theme strongly resonated in the closing session, which brought together voices from diverse career paths and regions to reflect on the current state of born-digital archives, collections, and memory, and to identify future directions.
Among the key takeaways were the need to foster data literacy and building digital citizens from a young age, as well as the importance of connecting with activists and minority communities to help them tell and preserve their stories.

Highlights and Takeaways from the Association of Internet Researchers Annual Conference (AoIR) 2024

At the end of October, I had the opportunity to attend the 2024 Association of Internet Researchers (AoIR) conference, which took place in the lovely city of Sheffield. This was my first time attending an AoIR conference and I was grateful to join such a vibrant meeting of Internet researchers from all over the world. As a Curatorial and Policy Research Officer for the Algorithmic Archive Project, currently exploring the ways in which social media and algorithmic data are being used across disciplines, this was a unique opportunity for me to engage with a diverse range of research on the web and social platforms.

This year’s AoIR conference was hosted by the University of Sheffield, with the Student Union building serving as the main venue. This impressive structure spans five floors and includes a cosy lounge area on the third floor, offering attendees a space to relax and network between sessions in a packed 4-day program. The main theme of this year’s AoIR2024 conference was “industry”, inviting the research community to reflect and discuss the relationship between the internet and industry. With over thirteen parallel sessions scheduled for each time block, choosing just one to attend proved to be rather challenging.

A view of the University of Sheffield, Student Union where some of the AoIR2024 conference sessions took place between 30 October – 2 November 2024. Photo taken by B. Cannelli

One aspect that really stood out to me from the conference was the diverse range of research involving information generated on social media platforms, spanning from creators’ economy dynamics, news polarization, AI applied in the context of online communities and content moderation, online pop culture and disinformation across various platforms. There were several panels discussing platform governance – the set of rules, policies and decision-making processes that shape how content is collected, accessed and used within a platform – shedding light on the power dynamics that influence user experience. From an archival perspective, understanding how platforms regulate access to data and the consumption of content is crucial, with significant implications for how this content can be archived by memory institutions.

Among the many sessions exploring virality phenomena and cultures on social media, it is worth mentioning the one reflecting on “mediated memory”. It examined how social platforms like TikTok serve, for instance, as spaces to remember displaced cultures, and how they facilitate the transmission of cultural aspects to younger generations, helping to perpetuate them through time and space. Additionally, the session titled “Times and Transformations” provided some excellent examples of research conducted with web-archived content from research libraries, along with insightful reflections on the epistemology of web archiving.

Firth Court, a Grade II listed Edwardian building that constitutes part of the Western Bank Campus of the University of Sheffield. Photo taken by B. Cannelli

Overall, the conference highlighted the crucial role social media data play in today’s communication landscape and underscored the value of platforms’ user-generated content as a key resource for researchers across a wide range of disciplines. The interplay of light and shadows explored in various panels on platform governance further emphasised the enormous power platforms hold over this user-generated data, as well as the pressing need for support to enable researchers to access and preserve these data over time. 

I left the AoIR2024 conference with so much food for thought! It has also been a fantastic opportunity for networking, which will be important for the scoping phase of the Algorithmic Archive project.