This post is a bit late as the DPC briefing day on Preserving Social Media was almost a month ago, but our excuse is that there was a lot of food for thought!
As digital archives trainees Rachael and I have spent a lot of time thinking about preserving social media (a bit sad maybe, but true!). Everyone loves web 2.0: It’s dynamic and complex; it gives us the ability to communicate and interact across continents; and it’s a giant headache if you’re trying to archive it!
So as you can see we were quite excited about this briefing day, and it did not disappoint!
Throughout the day the talks were pretty evenly split between various means of capturing and curating social media and how researchers looked to access and use it, as well as the quality of datasets they were able to pull from it. They also touched on the legal ramifications of preserving it and there were a few case studies that discussed lessons learnt from institutions that are actively collecting social media.
Nathan Cunningham introduced us to the concept of the Big Data Network and the UK Data Archive. He talked about how much data and metadata the web was currently generating and the funding that the government was putting into it.
Sara Thomson’s keynote focused on different strategies for capturing and curating social media, such as: the pros and cons of Platform APIs, Data Resellers, Third-party Services and Platform Self-Archiving Services. She also argued the need for better integration of Social Media with Web Archives in order to contextualize the social media; including preserving archived pages of content that URLs link to. She also focuses on more collaboration between institutions in terms of resources, access and methods/knowledge and within institutions with their own researchers and end users.
Stephen Daisley from STV talked about Social Media & Journalism, about how it provided diverse and up-to-date coverage through non-traditional channels and its use as a tool for those underrepresented in mainstream media.
After lunch we had Katrin Weller from GESIS discuss how social scientists were using social media (For research! Not lolcats!) and the challenges of collecting, sharing and documentation. Going back to the methods that Sara Thomson listed in her keynote, most involve a third party and have restrictions on how the data can be shared, what tools can be used on it, how much data they give you. She highlighted the difficulties this can cause when researchers want to replicate or expand upon another researcher’s work as well as other issues that come from using data that they researcher has not collected.
Tom Storrar from the National Archives rounded off the presentations with a talk on how the UK Government’s social media presence was being captured for posterity. His project was to capture the UK Government’s official Twitter presence. This involved deciding what would be in scope including content and metadata, how they would collect this data and finally how they would present it.
While I found Sara’s keynote interesting and quite informative—especially in terms of what is available out there and a balanced view of what they have to offer—it wasn’t as relevant as I had hoped as it was focused more on someone else providing the data to you rather than the tools you can use to collect what you are interested in. While there are many benefits to having authorised data resellers or the platform itself giving you archiving abilities (especially being able to harvest all the metadata associated with it) I like the flexibility and power that we get with Archive-IT (though of course in some ways it will be a much shallower collection as we only collect what the end-user sees) and the fact that we aren’t restricted to the data that the providers think we want.
I’m glad that she talked about the need for collaboration so that we don’t all try to reinvent the wheel. At the Bodleian we’re quite lucky because we work closely with other legal deposit libraries to capture web content (including social media) so we regularly have the opportunity to discuss and learn from each other’s experiences. We also have our own Bodleian Library Web Archive where we encourage our own researchers to use it as a repository and a resource that they can help us grow.
One thing that I found problematic was Stephen Daisley’s talk. Well not problematic, but perhaps a bit naïve? While I agreed with some of his points, I think he romanticises the notion of social media as the great equaliser. I can think off the top of my head at least one quite large group of underrepresented voices that are not getting their say in social media; the elderly. And I’m sure that there are many examples that you can come up with if you stop to think of it too. Just because the barrier to access is much lower than traditional news stations does not mean there is no barrier. The vast amount of data and metadata generated makes it tempting to believe that that is the whole of the story but I think we need to remember who isn’t part of the conversation.
I also really enjoyed Tom Storrar’s presentation because it highlights the need to have a clear collection policy, to realise you can’t and shouldn’t capture everything, and to make your decisions transparent so that researchers will know exactly what they do and do not have to work with.
Although the talks on Big Data and social science research were less relevant to our work on the Bodleian Libraries Web Archive, it was an eye-opening introduction to the sheer amount of digital data which is collected. This might be commercial research, profiting from the amount of information we can give to social media sites such as our name, nationality, photos, mobile number, address, and interests; or for forecasting purposes such as predicting results of political elections; or for academic study in areas such as activism, audiences, networks and crisis communication and response. I think Katrin Weller certainly succeeded in dismissing the claim that ‘99% of tweets are worthless babble’ – Weller, Social Media as Research Data, 27/10/2015.
Like Emily, I also enjoyed Tom Storrar’s presentation on the capture of government bodies’ Twitter and YouTube feeds. For me it really highlighted how complex the web of legislation is, requiring them to adapt to changing circumstances. If an organisation ceases to be a government body, the National Archives no longer has the right to capture its social media content. Because of these legal restrictions, no retweets or YouTube comments are captured, which means it is a one-way conversation. I think this is a shame, as we are losing that interaction which is so essential to social media. If YouTube comments are modern day equivalents to the letters sent to the government to comment on its policies, should we be preserving them?
Overall the day was full of fascinating talks and discussions on how to move forward in preserving social media. But, the best part of the briefing day was knowing we weren’t alone! We got to talk to people approaching preserving social media from very different angles; the BBC, the National Archives, etc. And even though we all had different mandates and different foci we still found a lot of common ground.