The call for nominations for the 2025 Steering Committee elections has closed. We had five vacant seats and received three nominations, so an election process is not required this year. We would like to congratulate and thank the Internet Archive, the Library of Congress, and the National Library of Australia, who will be continuing for another term.
Please find the statements of all the nominees below. The new, three-year term starts on 1 January 2026.
Nomination statements:
Internet Archive
Internet Archive seeks to continue its role on the IIPC Steering Committee. As the oldest and largest publicly-available web archive in the world, a founding member of the IIPC, and a creator of many of the core technologies used in web archiving, Internet Archive plays a key role in fostering broad community participation in preserving and providing access to the web-published records that document our shared cultural heritage. Internet Archive has long served on the Steering Committee, including as Chair, and has helped establish IIPC’s relationship with CLIR, the Discretionary Funding Program, the IIPC Training Program, and other initiatives. By continuing on the Steering Committee, Internet Archive will advance these and similar programs to expand and diversify IIPC membership, further knowledge sharing and skills development, ensure the impact and sustainability of the organization, and help build collaborative frameworks that allow members to work together. The web can only be preserved through broad-based, multi-institutional efforts. Internet Archive looks to continue its role on the Steering Committee in order to bring our capacity and expertise to support both the mission of the IIPC and the shared mission of the larger community working to preserve and provide access to the archived web.
Library of Congress
The Library of Congress has been involved in web archiving for over 25 years, was a founding member of the IIPC, and engaged in a variety of leadership roles within IIPC throughout the years. Library of Congress has developed numerous thematic and event-based collections for its web archives. Our organizational structure and collecting approach, plus our permissions-based process, gives us a unique perspective on collecting at scale. The Library of Congress is a long-time Steering Committee member, and staff presently co-chair Membership Engagement Portfolio, Training Working Group, and Content Development Working Group. In prior years staff have served as Steering Committee Chair, Vice-Chair, and Communications Officer; staff come to the table with a deep knowledge of the administrative and organizational aspects of running the IIPC. If re-elected, the Library of Congress looks forward to continuing to focus on membership engagement, supporting training and mentorship programs, advocating investments in tools and events in response to the needs of members, and engaging in strategic planning work that will help strengthen and sustain the organization for years to come. Library of Congress staff find incredible value in working collaboratively with our international partners to tackle the challenges of web archiving together.
Proposed Representative: Abbie Grotke
National Library of Australia
The National Library of Australia (NLA)’s mission is to collect today what will be important tomorrow, and its Australian Web Archive (AWA) embraces collaborative partnerships and selective, domain and bulk methods to collect Australian lives lived online. Established in 1996, it now holds around 1.2 PB or 21 billion URL snapshots, and is publicly accessible though Trove, Australia’s national discovery service.
The NLA’s strengths include operational maturity, a pragmatic approach to web archiving and commitment to open access. A founding IIPC member and Steering Committee member until 2009, the NLA hosted the second general committee meeting in 2008, before being re-elected in 2019. The NLA has a long record of agile innovation, from building the first selective web archiving workflow system to the outbackCDX tool, to current contributions to the IIPC Tools Portfolio.
Emerging from a period of major structural change, we now have restored capacity as well as renewed AWA relationships and are poised for a rejuvenation of activity both in Australia and for the international community. Kate Ross, product owner of the AWA, offers this freshly energised commitment to meaningful contribution, along with access to the NLA’s deep web archiving experience and our unique Australasian perspective.
What’s next for social media archiving at the National Archives of the Netherlands and the National Archives of Luxembourg?
by Susanne van den Eijkel (Advisor Digital Recordkeeping) and Lotte Wijsman (Preservation Researcher), National Archives of the Netherlands and Guilhem Costenoble (Archivist), Michel Cottin (Digital Curator), Maxime Detant (Archivist), and Camille Forget (Digital Curator), National Archives of Luxembourg.
This is the final blog post in a series of three about social media archiving at the National Archives of the Netherlands and Luxembourg and their collaboration on the subject.
Building connections
At the National Archives of the Netherlands (NANETH), we published our Guideline on Social Media Archiving in January 2025. Since then, we’ve kept the momentum going. We have presented our guideline to governmental organisations and the archiving community, emphasising its importance in relation to the Dutch Archival Law. Our main focus in these presentations was quality control that we have checked with significant properties, different techniques and tools to use, and how to analyse output in different file formats (blog 2).
In the Netherlands, the need to archive social media is clear, for both heritage and legal purposes as explained in our first blog post. However, there is no shared approach yet on how to do this, though the guideline is a good starting point. We are working on a policy for the Dutch government on the use of social media, what exactly should be archived, and how. This is done in collaboration with multiple experts on digital preservation, recordkeeping, policy making, and legal affairs. Additionally, the authors of the NANETH guideline have visited various Dutch government organisations to share their experiences with social media archiving, helped analyse the output, and gathered the challenges that arose during the process. This way, we are able to keep the guideline up to date and share our insights that can be used for the Dutch policy on social media archiving.
We also share our work, questions, and experiences within the international field. For example, we presented our work at the IIPC Web Archiving Conference (WAC) in Paris (2024) and Oslo (2025) and have had multiple meetings with experts from national archives and heritage institutions inside and outside the Netherlands. We have talked about the most suitable techniques and file formats for social media archives as well as the significant properties we find important. This helped us define the best way forward in social media archiving that has been shared within the Netherlands.
At the National Archives of Luxembourg (ANLux), we have prepared a guide for extracting content published on social media. This guide is distributed to archiving delegates and to communication departments within State services. It outlines the motivations and objectives of this data collection and provides step-by-step instructions. This guide is frequently updated to adapt to platform evolutions.
Figure 1: Archiving social media accounts. A practical guide for social media account users, by ANLux, 2025.
Targeted campaigns limited in time enabled the establishment of close ties with various stakeholders. First, with the National Library of Luxembourg (BnL), which brings its expertise in web archiving, and with the State Ministry, which officially launches the campaign and sets the tone in terms of transparency, illustrating democratic vitality. Then, with the Government Press Service, which centralises the official discourse and maintains close relationships with communication officers across ministries. The first campaign even led the Government Press Service to consider social media from a new perspective: ‘Preservation by design’, starting from the moment a social media account is created.
Moreover, we discuss the results of our experiences at international conferences, such as the International Archives Symposium in Arnhem in 2024, or at the 2025 Forum of the Association of French Archivists in Rennes, where valuable professional exchanges took place. For instance, our French colleagues became aware of the importance of archiving social media content in the lead-up to political elections, just as we were inspired by the French ministries’ long-standing experience in email and messaging account collection policies.
Keeping pace with a changing landscape
While we share NANETH’s knowledge through presentations, we are just as eager to learn from others. Social media archiving is constantly evolving. Platforms change, and so do the tools we rely on. To keep up with this changing landscape and keep the cycle of exchange going, we do several things.
First, we attend conferences and meetups, such as WAC and the Practice Network Social Media Archiving, which brings together Belgian and Dutch colleagues for workshops and inspiring talks.
Second, we keep up with new literature on the subject. Blogs such as this series are especially valuable. They show that despite working under different laws and contexts, we often face the same challenges.
Finally, we experiment whenever possible. Testing different tools helps us understand how they handle various platforms. One particular challenge we face when experimenting is our own hardware. Government laptops are rather locked down, so we are prohibited from installing software. Especially when tools have dependencies, it is impossible to experiment with these on our work laptops. Therefore, we often rely on old personal laptops for this kind of testing.
At ANLux, the collection of social media archives is based on a community-driven approach, built on transparency at every stage of the process. By directly involving account owners in the extraction of their data, this method ensures producer engagement and fosters the trust essential to the project’s success.
While harvesting has become more difficult in recent years, publishers still permit data extraction by the owner, a fundamental right guaranteed by the European Union’s General Data Protection Regulation (GDPR). In other words, even if platforms tighten their restrictions on scraping and crawling, they cannot prevent a user from accessing and retrieving their own data, which ensures a certain degree of stability within a changing landscape.
Moreover, we have observed that the implementation of our guidelines aligns with the evolving practices on social media, particularly during the recent wave of users leaving X, to recover data and transfer it to the National Archives, before their accounts were closed.
Furthermore, the extractions carried out in Luxembourg use sustainable formats (CSV, JSON). The possibility of format changes after extraction is a question ANLux needs to address for the future. Is it possible (or desirable) to convert the format of an archived account, such as changes into WARC files, for better use and preservation in the long-term, to prevent risks of obsolescence in an evolving technical environment? Although this approach gives satisfaction in terms of the results obtained during the collection campaigns, we await the exploitation by researchers of these collections.
Figure 2: ANLux partnered collection cycle for social media collection with communities.
Evolving with the times
At NANETH, we believe it is not enough to share our knowledge verbally. It is also necessary to record it (we do work in an archive). That is why we created our guideline. From the beginning, our goal has been to make it a living document, one that evolves alongside platforms, tools, and our own expertise.
To keep it manageable, the guideline avoids going into detail about specific tools. Instead, we focus on broader developments, such as platform changes or shifts in archiving practices. When our knowledge advances, the guideline evolves alongside.
Collaboration is key. Soon, we will establish a maintenance group with representatives from diverse Dutch organisations (city archives, ministries, universities, and regional archives). This group will review proposed updates, suggest changes of their own, and help ensure the guideline remains accurate and relevant. Through this collective effort, the document will be regularly refreshed and stay a reliable resource for all.
At ANLux, we are aware that tools and practices change. That is why we ensure our guide is up to date regarding the uses and needs of State services, as well as our policies. Evolving with the times also means we have to explore what has never been explored before — or in a different way — to collect what has not been collected yet, or partially. Our time is a time that paves the way for new archive collections, thanks to some collateral collections that occurred while we focused on social media archiving.
This goes hand in hand with a new approach to relationship-building, the designated partnered collection, which is critical for the success of current and future collections. To support our collaborations, we provided a dedicated help desk and phone support that fostered trust with communication services, positioning ANLux as a supportive partner rather than a regulator. To ensure the integrity of data, we employed a PowerShell script, which demonstrated technical expertise. This tool, with its “Matrix style” scrolling lines of code made quite an impression on the producers.
Figure 3: Screen capture of a PowerShell script used by ANLux to ensure archives integrity.
So the demystification of digital archiving turned out to be a collateral effect of this collection campaign. For many producers, the technical aspects of digital archiving were intimidating, as was the fear of inadvertently breaching privacy or security protocols. We provided reassurance by explaining the process and addressing their concerns about sensitive data. This pedagogical approach reduced anxiety and encouraged active participation in the archival process. The campaigns about such simple and structured data increased the perceived quality of our archiving process and methods, and showed that archives are still archives, even those that aren’t covered in dust.
Figure 4: Fly fishing instead of net fishing: results of two collection campaigns based on the partnered collection model at ANLux.
Conclusion
The experiences of the National Archives of the Netherlands and Luxembourg reveal two complementary approaches to social media archiving. In the Netherlands, the focus is on a solid framework, with a living guideline refined through national and international exchanges. This emphasises standardisation, quality criteria, and sustainable formats to build a coherent long-term policy. In Luxembourg, the approach relies on direct content producers’ involvement and a partner-based model. By engaging governmental communication services, ANLux fosters trust, transparency, and collective ownership. This participatory method favors dialogue over automation and opens up pathways for new types of digital collections.
Together, these approaches demonstrate that preserving social media memory requires both structure and partnership, balancing rigor and adaptability in a rapidly evolving digital world. Cross-institutional collaboration has proven its worth. Think of social media archiving as fly fishing: each cast catches something, before moving on to net fishing for other new collections.
By Friedel Geeraert, Expert in web archiving and Christina Vandendyck, developer on the BelgicaWeb research project at KBR | Royal Library of Belgium
This year’s IIPC General Assembly and Web Archiving Conference took place in Oslo at the beautiful Nasjonalbiblioteket. For the first time there were two representatives of KBR, the Royal Library of Belgium.
Within the BelgicaWeb project, the focus is on (1) investigating how to provide sustainable access to born-digital collections, (2) developing the necessary data infrastructure to access such collections, (3) enriching the (meta-)data, (4) analysing the relevant legal frameworks and (5) promoting Belgium’s born-digital heritage. The conference programme offered a lot of interesting content about these themes.
Since the programme was so rich, we started every morning of the General Assembly and the Web Archiving Conference with a short meeting to divide the sessions between us to discover and learn as much as possible. During the many breaks, the welcome reception and the conference dinner, we enjoyed discussing our project and initiatives of colleagues at other organisations and took the opportunity to participate in guided tours of the two excellent exhibitions at the Nasjonalbiblioteket. As always, the atmosphere was warm, collaborative and relaxed. Meeting fellow web archiving professionals (again) is always a delight.
General Assembly
The General Assembly kicked off with a beautiful video explaining the history of the Norwegian web archive. During the opening remarks, two new IIPC members were welcomed: the Publications Office of the European Union and the Common Crawl Foundation. Jeffrey van der Hoeven, the Chair of the Steering Committee, shared three concerns with the IIPC community in his address: sustainability, diversity and continuity. To address these, it was proposed to (1) offer better support to the most vital web archiving tools by developing a sustainability framework, (2) introduce a new election process to increase diversity, and (3) start keeping track of risks by means of a risk registry. These are initiatives KBR can only applaud.
The IIPC Strategic Plan for 2026-2030 was then unveiled. The priorities are (1) to sustain and support tools, (2) to share web archiving knowledge, (3) to advocate for web preservation, (4) to promote usage of web archives and (5) additional organisational priorities. Afterward, the IIPC Treasurer, Bjarne Andersen, took members through the proposed budget for the coming years.
IIPC Chair Jeffrey van der Hoeven giving the opening remarks at the General Assembly. Photo: Frode Steen | National Library of Norway
The General Assembly also included a session on the Framework for Tools Sustainability that was based on the results of member surveys, in which tools were considered by the majority of respondents to be the top priority, as well as brainstorming sessions at the previous annual event. The session was aimed at gaining feedback from IIPC members about the Framework and a number of volunteers stepped forward to become part of the new Technical Committee that will be in charge of this new framework.
Framework for Tools Sustainability session of the IIPC 2025 General Assembly. Photo: Frode Steen | National Library of Norway
In the afternoon of the General Assembly, there was a meeting of the Research Working Group during which the Web Data project at the Nasjonalbiblioteket was introduced. A lot of similarities exist between this project and the BelgicaWeb project KBR is currently working on, so both research teams will definitely keep in touch. The National Library of New Zealand shared their long journey regarding the ‘whole of domain’ crawls and the Common Crawl Foundation gave an overview of the use of their datasets for research.
Web Archiving Conference
On the first day of the conference, the keynote speaker, Javier de la Rosa, presented the Mímir project and the role of copyrighted materials in training large language models. It was interesting to hear about one of the conclusions that ‘there is empirical evidence supporting the thesis that copyrighted material improves model performance’ and that ‘to a large extent this effect seems to be mediated by non-fictional content’. There seems to be something profound about fiction that influences the large language models.
Håvard Lundberg and Ida Haugen-Poljac of Analysis & Numbers deliver the closing keynote.Photo: Frode Steen | National Library of Norway
To close the second day of the Web Archiving Conference, Håvard Lundberg and Ida Haugen-Poljac of Analysis & Numbers were invited to share their experiences using web data to quantify hate and polarisation and map the spread of misinformation. The question asked by a member in the audience regarding what we can do to stop this spread of hate online and the answer – not staying silent but reacting with as many as possible against these messages of hate to form a collective shield – will resonate with us for a long time to come.
Final Thoughts
As a first-time attendee, Christina found the experience both enriching and inspiring. Being part of a community with a strong technical focus was particularly exciting for her. The opportunity to engage in discussions about the intricate technical aspects of the BelgicaWeb project with fellow developers and web archiving professionals was invaluable. She was also thrilled to discover that other institutions are working on similar projects, which opens promising avenues for future collaborations. The exchange of ideas and experiences with peers from different organisations not only provided fresh insights but also fostered a sense of camaraderie and shared purpose. We look forward to building on these connections and hope for fruitful collaborations that will advance the field of web archiving.
The Organising Committee and the Programme Committee certainly set the bar very high this year and the programme has given us a lot of inspiration for the BelgicaWeb project. Especially promising for the project are the Pandorae visualisation tool developed by the Bibliothèque nationale de France, the Outback CDX tool for faster indexation, Browsertrix behaviours to capture social media content, SOLRWayback, the checklist to publish collections as data in GLAM institutions, the Web Data project at the Nasjonalbiblioteket, and the efforts at the National Library of Scotland to tailor web archives for different use cases.
During the closing remarks, the proverbial torch was handed over from the Nasjonalbiblioteket to KBR since the 2026 edition of the IIPC General Assembly and Web Archiving Conference will take place in Brussels. The preparations are already well underway.
This #WebArchiveWednesday, we are thrilled to announce that #iipcGA26 and #iipcWAC26 will be hosted by @kbrbe.bsky.social, the Royal Library of Belgium in Brussels! #webarchives | #webarchiving | #digitalpreservation
We will not be able to offer Michelin star awarded food like the Nasjonalbiblioteket, but since Belgium is a country renowned for its beer, chocolate, waffles, and fries, we are confident about being able to compensate for that. We look forward to welcoming you to Brussels next year!
by Cui Cui, University of Sheffield/Bodleian Libraries at the University of Oxford
The IIPC Web Archiving Conference (WAC) 2025 was held from April 9-10 at the National Library of Norway (NB) in Oslo. This event saw nearly 200 members of the international web archiving community come together to share their work and connect with colleagues, including several students who applied for and received bursaries from the IIPC for their attendance.
As a mature student, receiving a student bursary was essential in making my attendance at the conference possible. It also prompted the Information School at the University of Sheffield to provide additional funding support, which enabled me to both present my work and actively participate in the event.
The conference featured a series of high-quality presentations and provided a valuable opportunity to learn about current developments in the field, explore new practices and case studies, and engage in meaningful discussions with experts from diverse disciplines and countries. I returned feeling inspired and energised, with fresh perspectives and renewed motivation to continue my research on participatory web archiving.
I am particularly interested in participatory practices in archive development, access, and using web archives. Across these sessions, I have three main observations:
Observation One: The Growing Participation of Diverse Stakeholders in Web Archiving
Several presentations showcased efforts to engage and collaborate with the general public, members of government, local communities, and content creators, and leveraged the network of social media influencers to encourage wider participation:
These initiatives demonstrate that stakeholders—whether from government, creative industries, or the general public—can play an essential role in shaping collection development. What stands out across these projects is the importance of innovative approaches, sustained funding, and the cultivation of close partnerships to support more inclusive and representative web archives.
“Archiving the Social Media Profiles of Members of Government” presented by Ben Els. Photo credit: Bryony Hooper
Observation Two: The Expansion and Overlap of Web Archiving Into Adjacent Fields
Web archiving is increasingly expanding into the domains of digital preservation and research data management, and research database development. At the same time, these fields are recognising and incorporating web archives as valuable resources and tools. This evolving relationship reflects a dynamic interplay where the boundaries between fields are shifting, overlapping, and mutually reshaping one another. These movements can be clearly observed in the following presentations:
“Strategies and Challenges in the Preservation of Mexico’s Web Heritage: First Steps” presented by Carolina Silva Bretón. Photo credit: Mat Kelly
Observation Three: The Wide-ranging Benefits of Web Archives
While web archives are often promoted for their potential to support large-scale analysis through big datasets, it is equally important to recognise their value in small-scale, close-reading approaches. This more nuanced use is evident in IXP History Collection: Recording the Early Development of the Core of the Public Internet by independent researchers Sharon Healy and Gerard Best and their co-author Lara Díaz Martínez (University of Barcelona). Similarly, Alan Colin-Arce (University of Victoria) and Rosario Rogel-Salazar (Universidad Autónoma del Estado de México) advocate for lowering technical barriers and adopting minimal computing approaches in A Minimal Computing Approach for Web Archive Research.
Although not as mainstream, these examples challenge the dominant narrative of “big data” in web archiving and highlight the value of smaller, curated collections and close engagement with individual resources. They underscore the importance of lowering the barriers and making web archives accessible for diverse research methods—not only for computational analysis, but also for interpretive and human-centred inquiry.
“Where Fashion Meets Science: Collecting and Curating a Creative Web Archive” by Elisabeth Thurlow. Photo credit: Bryony Hooper
There are more valuable presentations in the IIPC 2025 conference that are worth discussing. I am highlighting these examples as they remind us that web archiving is still a rapidly evolving field—one that is not only expanding its reach and forming new connections with a wide range of users, communities, and disciplines, but also redefining its role within the broader landscape of libraries and archives. At the same time, there’s value in returning to more traditional, foundational approaches, such as close reading and contextual interpretation, which remain essential in expanding user groups alongside the promise of large-scale data-driven research.
by Alan Colin-Arce, University of Victoria, recipient of a student bursary by the IIPC for attendance to the 2025 IIPC Web Archiving Conference
The IIPC Web Archiving Conference 2025, held at the National Library of Norway in Oslo, was full of thought-provoking presentations on the current state of web archiving.
The conference opened with a keynote presentation by Javier de la Rosa from the National Library of Norway. He discussed the use of data from the Norwegian Web Archive as well as digitized material from the national library to build large language models in Norwegian. Their work showed some promising results, but the models cannot be made publicly available due to the copyright of the original works that been included in the national library’s collection. In order to make them available, new copyright regulations and compensation schemes are needed.
Javier de la Rosa of the National Library of Norway delivers the opening keynote. Photo: Frode Steen | National Library of Norway
This tension between technical possibilities and social and political limitations was present in several sessions. For example, the panel on Wednesday, “Beyond Preservation: Engaging Audiences and Researchers with Web Archives“, asked pertinent questions regarding about how to increase the awareness and use of web archives in research. Most panelists agreed that more intuitive interfaces for accessing web archives could be one way of increasing this engagement. An interesting comment in this panel was that web archives already contain diverse perspectives and communities, but that this diversity needs to be found by researchers rather than it being easily findable.
Panel 1, “Beyond Preservation: Engaging Audiences and Researchers with Web Archives” From left: Cui Cui, Beatrice Cannelli, Andrea Kocsis, Anders Klindt Myrvoll, and Eveline Vlassenroot Photo: Frode Steen | National Library of Norway
Later on Wednesday, the talk “Lost, but Preserved – A Web Archiving Perspective on the Ephemeral Web” offered a fascinating take on link rot, the process by which hyperlinks no longer display the original content but errors, often the infamous 404 error. In this talk, Sawood Alam (of the Internet Archive) questioned the accuracy of the statistics about the extent of link rot on the web. He argued that these statistics do not consider how many of the websites that experience link rot are archived on the Wayback Machine. When considering archived websites, the prevalence of link rot is lower, although there are still websites linked to in news or research articles that are never archived.
“Lost, but Preserved – A Web Archiving Perspective on the Ephemeral Web” presented by Sawood Alam Photo: Alan Colin-Arce | University of Victoria
“A Minimal Computing Approach for Web Archive Research” presented by Alan Colin-Arce. Photo credit: Rosario Rogel-Salazar
Later in the day, there was a fascinating session on social media archiving in three different countries: the Netherlands, Luxembourg, and Singapore. While the approaches were different, they were all equally innovative. The Netherlands developed institutional guidelines at their national library before starting with social media archiving. Luxembourg used a participatory approach that encouraged members of government to voluntarily share their public social media data with the national archives, and they got their Prime Minister to be the first person to share this data. Singapore used a similar participatory approach with cultural organizations, but they also encouraged everyday Singaporeans to donate their social media posts to the Singapore National Library for specific nationwide events or special topics. To promote this approach, they hired two influencers to give talks and publish Instagram posts to raise awareness of social media archiving.
“From Posts to Archives: The National Library of Singapore’s Journey in Collecting Social Media” presented by Shereen Tay and Meiyu Lee. Photo credit: Bryony Hooper
The diversity of topics and discussions shows that there are many innovative and creative ways to create, maintain, and use web archives, from small-scale projects to nationwide initiatives. The IIPC conference was a great experience that allowed me to be in a room where everyone understood the importance of preserving the web and making sense of it. It was inspiring to see how everyone in attendance was thinking about how to improve these processes to make them more accessible, diverse, and efficient.
The Steering Committee (SC) is composed of no more than fifteen Member Institutions. SC Members provide oversight of the Consortium and define and oversee action on its strategy. This year, five seats are up for election.
What is at stake?
Serving on the Steering Committee is an opportunity for motivated members to help guide the IIPC’s mission of improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage. SC members are expected to actively contribute to the leadership and governance of the organization.
Every year, three SC members are designated as IIPC Officers (Chair, Vice-Chair and Treasurer) to serve on the IIPC Executive Board and are responsible for implementing the Strategic Plan.
The SC members meet in person (if circumstances allow) at least once a year. Face-to-face meetings are supplemented by two teleconferences plus additional ones as required. The key tasks for the upcoming term include guiding the implementation of the new Strategic Plan and Consortium Agreement.
Who can run for election?
The SC shall ideally reflect a diverse range of types and sizes of member organizations and roles within the web archiving community and represent the geographic spread of membership. Participation in the SC is open to any IIPC member in good standing. We strongly encourage any organisation interested in serving on the SC to nominate themselves for election.
Please note that the nomination should be on behalf of an organisation, not an individual; however, we do ask that you include the name of the likely representative in the nomination statement. The list of current SC member organisations is available on the IIPC website.
How to run for election?
All nominee institutions, both new and existing members whose term is expiring but are interested in continuing to serve on the SC, are asked to write a short statement (no longer than 200 words) outlining their vision for how they would contribute to IIPC via serving on the SC. Statements can point to current and past contributions to the IIPC activities (e.g. through collaborative projects, conference hosting, participation in SC, Working Groups or task forces), relevant experience or expertise, new ideas for advancing the organisation, or any other relevant information. View past nomination statements here.
All statements will be posted online and emailed to members prior to the election, giving all members ample time to review them. The results will be announced in November, and the three-year term on the Steering Committee will start on 1 January.
Below is the election calendar. We are very much looking forward to receiving your nominations. If you have any questions, please contact the IIPC Senior Program Officer (SPO).
Election Calendar
2 July – 1 October 2025: Nomination period. IIPC Designated Representatives are invited to nominate their organisation by emailing the IIPC SPO. The nomination statement should be no longer than 200 words.
2 October 2025: Nominee statements are published on the Netpreserve blog and circulated to the Members mailing list. Nominees are encouraged to campaign through their own networks.
2 October – 3 November 2025: Members are invited to vote online. The vote is cast by the Designated Representative.
5 November 2025: The results of the vote are announced on the Netpreserve blog and Members mailing list.
1 January 2026: The newly elected SC members start their three-year term.
Techniques and Tools for Social Media Archiving: Casting the Right Fly or Net
by Susanne van den Eijkel (Advisor Digital Recordkeeping) and Lotte Wijsman (Preservation Researcher), National Archives of the Netherlands and Guilhem Costenoble (Archivist), Michel Cottin (Digital Curator), Maxime Detant (Archivist), and Camille Forget (Digital Curator), National Archives of Luxembourg.
This blog post is the second of three about social media archiving at the National Archives of the Netherlands and Luxembourg and their collaboration on the subject. The first blog can be found here.
Over the past years, we have seen various social media platforms change. There are different rules and restrictions in retrieving information online, therefore different techniques are needed to archive the content on these platforms properly. Social media archiving has also become an increasingly discussed topic. There is no universal method to collect social media. Choices depend on institutional goals, legal and technical constraints, and evolving platforms. Despite the challenges, practical guidelines can support the process.
Defining the significant properties of social media
In the first blog, we defined social media and provided context for different policy approaches. The next step is to look at the live web and declare what you think are the significant properties of the social media posts and accounts. In other words: what are the elements that have to be included in the archived version of social media? Is it important to show images, GIFs, and videos, or is text enough? Don’t forget to consider the emojis. They can have a specific meaning in a specific context. How do you make sure you archive that context as wellSignificant properties are defined as “the characteristics of digital objects that must be preserved over time in order to ensure the continued accessibility, usability, and meaning of the objects” (JISC, 2009). The properties are divided in five categories:
Content: This is about what content, such as the texts or an emoji express, not about the form. For example, it’s not only about seeing a “thumbs up”, but it also has to be clear what “thumbs up” means.
Context: Metadata, such as author and date, clarifies who posted something and when. It therefore gives context to a message.
Appearance: Think of color and layout. For example, the specific color blue of Facebook, or the bird/X of Twitter/X. This defines if the archived view is authentic, related to the live version.
Behaviour: Interaction and functionality. For example, a form you can fill in or a video you can play.
Structure: Different types of parts in a post can be related, such as posts from one account that are shared to another account (reposts).
Figure 1: Example of the category behaviour. Those elements that have a border have to act in a specific way. For example, the link has to refer to a new web page.
Facing technical constraints
Since the mid-2010s, major social media platforms have progressively restricted access to their data, complicating archiving efforts. Initially driven by the protection of their commercial interests, these restrictions have intensified over the years. Twitter, for example, limited the sharing and transfer of data collected via its APIs as early as 2016, before easing these restrictions for non-commercial research between 2017 and 2020. However, in 2020, the introduction of the “Twitter Academic” API, separate from the paid general API, excluded most archival institutions.
Since the mid-2010s, major social media platforms have progressively restricted access to their data, complicating archiving efforts. Initially driven by the protection of their commercial interests, these restrictions have intensified over the years. Twitter, for example, limited the sharing and transfer of data collected via its APIs as early as 2016, before easing these restrictions for non-commercial research between 2017 and 2020. However, in 2020, the introduction of the “Twitter Academic” API, separate from the paid general API, excluded most archival institutions.
An overview of Social Media Data Policies is shown in Figure 2 below. It is based on feedback from Ben Els (National Library of Luxembourg), Archive-it, Vladimir Tybin (National Library of France); and on the thesis of Beatrice Cannelli (2024).
Figure 2: Evolution of Social Media Data Policies, that shows how access to platforms has altered over time.
Unfortunately, large-scale harvests are no longer as qualitative. The restrictions that have appeared in a finally delimited period must lead us to rethink how to collect data in a more reasoned way and to make tactical choices.It is by observing this “evolving” enclosure of the platforms and the irregular changes of rules that we opted for an extraction technique from the parameters of the individual account in the National Archives of Luxembourg. It seems that these modalities remain possible in a relatively uniform way.
This individual extraction method adopted by the National Archives of Luxembourg for collecting social media archives offers a major advantage in terms of sustainability. Unlike web harvesting tools, which face increasingly strict restrictions imposed by platforms, individual extraction relies on a fundamental right guaranteed by a European Union Regulation, the General Data Protection Regulation (GDPR): the right to data portability. Article 20 of the GDPR requires data controllers, including social media platforms, to allow users to retrieve their personal data in a structured, commonly used, and machine-readable format. In other words, even if platforms tighten their restrictions on scraping and crawling, they cannot prevent a user from accessing and retrieving their own data.
The National Archives of the Netherlands have researched multiple techniques to archive social media, keeping the limitations in mind. As a result, we recommend using more than one technique to archive the material, to get as many of the significant properties as possible. Ideally, this means choosing a method that focuses on the text and a method that makes sure to archive the look and feel of the platform. We briefly describe four techniques here, including the pros and cons.
API
API is short for Application Program Interface. It allows computers to ‘talk’ with each other: one asks, the other responds. In practice, this means that a user can type in a web address to open a specific web page (request) and the computer makes sure it shows the web page that is requested (response). This works for social media, as each post is a little web page on itself.
Apart from having some technical background (because you have to understand how the API works and how to connect with it) this was a fairly easy way to have access to social media data. The outcome is structured, with textual data that can be saved as a CSV file. This could be opened in Excel for example, and would provide different columns that include the text of the post, the publisher, the location, timestamp et cetera.
However, in recent years the access to APIs of social media platforms is limited. One of the main things is that you now have to pay for access. Furthermore, the platforms are not transparent in what they’re sharing. This makes it almost impossible to reconstruct how complete and authentic this kind of data is.
“Download” function within social media accounts
Most of the social media platforms offer the option to download your own data. This is the most basic option, one does not need technical skills to retrieve the data. The package that you download most frequently consists of an index HTML file that you can open in your browser. It is possible to navigate through it without being connected to any social media account or even to a network, while keeping the feeling of navigating through the original social media website. Next to that, you’ll receive separate files, such as json and javascript files, CSV, jpeg and MP4 files. In these files, you can find the publications content (whether textual or visual with images and videos) and context (publication date and taken date for audiovisual content), information about the account, advertisements and followers.
It looks quite like the real deal, but there are some differences from the live platform to keep in mind. The most important thing about Luxembourg’s collection principles is that the archive only contains publications from the account owner. This principle is the subject of a collection agreement between ANLux and the ministries. Furthermore, the images are smaller files than those that were posted on the live web.
Do note that the content of your discussions with others may be included, meaning that if you or the person you corresponded with shared personal data, this data is now included in your archive download. However, keep in mind you can still decide to exclude these private messages data before the account extraction or to remove these data after the account extraction.
The main difficulty with this method is that you need to reach out to users who have access to the accounts you wish to archive and to convince them to perform data extraction. This requires an active participation and collaboration between the data producers and the archiving institutions, leading to another form of collect that we have called “partnered collection”. This is the method ANLux chose, with good results for targeted collection campaigns with limited perimeters.
Screen capturing
A screen capture is a static or dynamic recording of your screen. This could be a still image (screenshot) or a more dynamic video of the recording. There are multiple tools to choose from that you can use, that enable you to automatically or manually capture the content. This method results in an image, PDF or video of the content. Only what’s seen on your computer screen is recorded, so this means that, if you are doing this manually, you have to open every separate post and reaction in order to capture it. If it’s automated, you need to find a way to check if everything was opened. As you can imagine, this is very exhausting work.
Web harvesting
Web harvesting (also known as ‘scraping’ or ‘crawling’) is the most common web archiving method. Software is used to archive web content automatically, and it often results in a WACZ or WARC file. The WARC file is the standard file format that is used for web archives. In the NANETH opinion this is most suited for social media archives as well. WARC files do not only contain the content of the webpages, but also some metadata. It is a container file format that includes all the file formats that have been archived and with a special reader the file can be displayed in the correct order on your screen. There are multiple free tools available to harvest web pages and to render the files in a WARC reader. Most preservation systems include a viewer that can render WARC files as well.
Conclusion: Return of the fisher to the port
Having multiple collection methods is always an advantage. Knowing how to combine them is a challenge. In heritage institutions, collecting everything requires method and discernment. The experience of the National Archives of Luxembourg, applying traditional archive collection processes in the field of social networks has demonstrated the feasibility of carrying out a social media collection, though key differences remain with web harvesting in scope and archival status. A partnered collection cannot replace all harvesting techniques. To date, we have not yet had the experience of combining different collection techniques on communities or themes for example: the fishers must sometimes venture into deeper waters or even further out to sea.
Figure 3: Multiple collection approaches are to be evaluated.
By lead curator Melissa Wertheimer, Senior Digital Collections Specialist for Web Archiving at the Library of Congress and Content Development Group Co-Chair.
The Co-Chairs of IIPC’s Content Development Working Group (CDG) invite the public to contribute web-based content to a new event-based collaborative web archive collection: the World War Two 80th Anniversary Commemoration Web Archive. This collection relates thematically to IIPC’s 2015 World War I Commemoration Web Archive.
The year 2025 is the 80th anniversary of the end of World War II, which concluded in 1945. This year includes specific dates of note that commemorate events that led to the war’s end. The 80th anniversary of Victory in Europe Day (VE Day) is 8 May 2025. The 80th anniversary of the atomic bombings of Hiroshima and Nagasaki are 6 and 9 August 2025. The 80th anniversary of Victory over Japan Day/Victory in the Pacific Day (VJ Day/VP Day) is 15 August 2025 in the United Kingdom and 2 September 2025 in the United States.
What we are collecting
This event-based collaborative collection between IIPC members and the public will include websites and individual web pages that document anniversary events, including:
physical and online sites of memory, such as memorial ceremonies, veterans’ activities, and military cemeteries;
physical exhibits and related events, such as those hosted by museums and historical societies;
information on commemorative works of literature, visual art, and performing arts;
memorial events related to specific battles in the European, Pacific, and North African theaters; and
memorial events that commemorate specific events including VE Day, VJ Day, the Holocaust and concentration camp liberations, and the atomic bomb attacks on Hiroshima and Nagasaki.
Websites and webpages that represent the geo-political and linguistic breadth of the war’s participants are vital to the commemorative nature of this web archive. Online news articles that cover 80th anniversary commemorations, interviews, and ceremonies are especially valued. Complete websites and individual web pages created by organizations, institutions, and groups whose founding origins or missions relate directly to promoting the memory of the Second World War and its social, political, economic, and artistic impacts are also vital to the authenticity of the collection.
Out of scope
Online content created by Holocaust deniers is out of the scope of this collection; however, legitimate online textual content that documents the existence and dangers of Holocaust denial is within scope, especially pertaining to the anniversary year.
Social media is excluded from this collection for technical reasons: to maximize available data for websites and individual webpages and to ensure capture quality over the short duration of the event-based crawls.
How to participate
Members of the public may nominate URLs by using this online form. The collection will run three crawls during the first week of June, August, and September 2025.
For more information and updates, you can contact the IIPC Content Development Working Group team at [email protected].
Strategies Beyond Web Harvesting: Net or Fly Fishing?
by Susanne van den Eijkel (Advisor Digital Recordkeeping) and Lotte Wijsman (Preservation Researcher), National Archives of the Netherlands and Guilhem Costenoble (Archivist), Michel Cottin (Digital Curator), Maxime Detant (Archivist), and Camille Forget (Digital Curator), National Archives of Luxembourg.
This blog post is the first of three about social media archiving at the National Archives of the Netherlands and Luxembourg and their collaboration on the subject. This first part focuses on the undertaken initiatives to archive social media and its scope. The second will provide an overview of tools and techniques, and the third will discuss advocacy.
Pieces of the Same Puzzle
As National Archives, we are responsible for the selection, evaluation, maintenance, preservation and access to physical and digital information. Information can take many forms, and social media is one of them. Internationally, there has been increasing attention on how to archive and preserve social media content. The ongoing instability surrounding social media content, including efforts to restrict web scraping, has made its long-term preservation a challenge that crosses professional and geographical boundaries for archivists and librarians. Government communication, in particular, presents a sensitive issue for institutions focused on the preservation of historical heritage.
During the International Archival Symposium in Arnhem, the Netherlands, in spring 2024, colleagues of the National Archives of Luxembourg (ANLux) and the Netherlands (NANETH) started a conversation about our respective experiences on the topic of social media archiving. We shared knowledge on the topic via various online meetings and found out that both organisations had pieces of the same puzzle. We spoke of the projects that we had executed within our own organisations, the legal and technical challenges, and our experiences with selection policies, tools and preservation of the material. This blog series is the result of this collaboration.
All Roads Lead to Rome
We found that we had a similar problem but took a different path to find solutions for archiving social media content. ANLux has been promoting the download function on platforms that allows users to export an ‘archive’ of their own data. The method is now published in practical guidelines for Luxembourg public sector organizations to use and updated each year to follow the social media evolutions. NANETH, on the other hand, started research on different techniques to archive social media content that has been published in guidelines for Dutch government organisations to use. There is one major difference between our approaches. ANLux is actively archiving social media, in collaboration with the public administrations and ministries, as a daily business and with annual campaigns. However, NANETH is not responsible for archiving the material. We only advise on how to archive social media and research different techniques. NANETH is responsible for the long-term preservation of the content, once it is transferred to the National Archives.
In Luxembourg, archiving social media was one of the National Library of Luxembourg’s (BnL) attributions of legal deposit. But, as time went by, the BnL was faced with the challenge of adapting traditional web archiving methods, which were limited by publishers to capture social media, while ANLux had been developing retention schedules for Luxembourg public sector organizations since 2018. Moreover, ANLux can rely on a dense network of archiving delegates within these organizations. As a result, the BnL requested the support of the ANLux which developed a procedure to extract government-related accounts in strict compliance with the retention schedules. This led to a first partnered campaign at the end of 2023, with a collection scope limited to the Ministers of Xavier Bettel’s government, 2018-2023, which has had significant results.
In the Netherlands, the Dutch Archival Law (1995) states that ‘documents, whatever their form, received or prepared by public authorities’ have to be archived. This includes government information on the various social media platforms, which means the National Archives are obliged by law to permanently store transferred material. This contrasts with the the the Dutch National Library’s role (KBNL) as it does not have a legal mandate to preserve (digital) material for the long term. In 2020, the first projects in the Netherlands started around social media archiving in collaboration with the Dutch Digital Heritage Network. Since that moment, there have been multiple initiatives and projects to research how to best archive social media, as is shown in Figure 1.
In 2025, NANETH published a guideline on archiving social media. The main focus is government organisations, but the guideline is suitable for heritage institutions as well. Dutch government institutions are responsible for archiving their own content. As National Archives, we can only advise on how to do it, and what’s best for the long-term preservation of the content. For example, in the guideline, we have explained that the method you choose to archive the material determines the information object you create. The guideline will be used as a basis for policymakers and has been developed by experts in the field of social media archiving from the Netherlands and Belgium.
Figure 1: Timeline of social media archiving initiatives in the Netherlands.
In Luxembourg, the law of August 17, 2018 relating to archiving states that “all documents, including data, regardless of their date, place of storage, material form and medium, produced or received by any natural or legal person and by any public or private service or body in the exercise of their activity” are archives that must be preserved. Besides, this law makes it mandatory to develop retention schedules for public sector organisations. In these retention schedules, a specific category is dedicated to the content of social networks, whose final outcome is conservation. ANLux has published a practical guideline for government services with the procedures for extracting data from individual accounts for social media platforms used for government communication, mainly Facebook, LinkedIn, Instagram, Twitter/X. These practical procedures are adapted from the “user guides” of each platform and include recommendations on the types of metadata to preserve as well as limitations to comply with the GDPR. This document is supplemented by the payments procedure in the administrative sense as well as recommendations for data transfer techniques.
On the occasion of the governmental transition of 2023, ANLux decided to launch a campaign to collect social media content published by each minister. This project was carried out in collaboration with the BnL, the Government press service, and the Prime Minister’s office. To prepare this campaign, ANLux had the opportunity to carry out a collection test based on the Facebook account of the Prime Minister who agreed to set an example. This helped not only to determine the selection criteria, but also to pave the way for other ministers, showing that individual extractions are still possible and without any risks for privacy. More on this will be explained in the second blog.
Figure 2: Timeline of social media partnered collection in Luxembourg.
The NANETH guideline explains the different techniques there are to archive social media, which is linked to the principle of significant properties and (where possible) archiving by design. For example, we explain how the extraction of data with the API works, what the outcome is (e.g. in which file format the data is saved and how it can be viewed) and what this means for the trustworthiness and authenticity of the material. More on the techniques can be found in blog 2. In contrast to exploring the available techniques, ANLux has proposed a more accessible solution of downloading the users’ own content as long as this option is available. ANLux also provides information on what are the best formats to choose for long-term preservation. There is no right or wrong in this, as there are different roads to take. However, our destination is the same. We try to safeguard the information from this relatively new medium for future generations.
Exploring the Scope of Social Media
Before starting with social media archiving, it is important to define what we mean by social media. The term has been in use since the 1990s “to indicate a new medium that enabled social interaction between users on the web” (Cannelli, 2024). This also means you have to decide which social media platforms to focus on. The ones that are the main focus for both Luxembourg and the Netherlands are X (Twitter), Facebook, LinkedIn, and Instagram. Perhaps we will need to reconsider the scope of archived platforms in the future, with the use of new social networks and the decline or even disappearance of other platforms. For example, ministries and ministers are starting to create TikTok accounts in Luxembourg, while many public organisations have already made the decision to leave X (Twitter). In the Netherlands, some of these organisations are shifting to Bluesky and Mastodon.
At NANETH, we defined a rather broad definition of social media. We consider everything the Dutch government publishes on social media as information we want to archive. For social media, this includes communications with Dutch citizens about the COVID-19 lockdown in 2020 or initiatives such as the national garden bird-counting event. Additionally, we believe that the interaction on platforms with civilians, for example, is important to archive as well. If a government organisation is responding to a third-party post, we want to archive the reaction and the original post to ensure completeness. It is important to keep this in mind, as the method you choose for archiving social media has a big part to play in the information object you create. However, the private messages are not in scope for social media archiving. This will be part of a direct messaging policy that is under development, and which covers chat apps such as WhatsApp and Signal.
ANLux chose to follow two main principles to determine their collection strategy: the principle of preserving intentional and official publications from the government and the public sector organisations, and the principle of reduced community collection. The first principle means that only the elements which had been the subject of a publication (in the sense of legal deposit) are preserved. This is why the focus is on public posts, notably excluding interactions with the followers (such as private messages or comments) and social network accounts that did not have the status of government communication. This selective approach ensured that only publicly accessible information was archived, respecting privacy and confidentiality.
The second principle of collecting material dedicated to a specific community came into play from the beginning of the project. The very fact that ANLux would only collect social media accounts mentioned in the retention schedules has consequences on the scope of the collection. This principle is also largely influenced by a factor independent of the National Archives’ will, which are conditions of use and extraction permitted by the editors of the targeted social networks. Indeed, automated harvesting with robots is limited. It seems certain that social media platforms will not back down from allowing mass harvesting.
The partnered collection method that we chose at ANLux has the advantage of involving the data producer in the game rules, like more traditional archive transfers. Let’s take up the challenge with all its opportunities, but also its share of constraints. This method allows us to envisage the broadening of the scope initially targeted in the annual campaigns. Indeed, these campaigns enabled ANLux to make themselves seem more accessible to the ministries and administrations, whether it be to collect social media accounts or other types of archives. Since such campaigns, other organizations – outside the scope of the archiving law – have informed us that they collect social media accounts, for example, with the departure decisions of certain platforms.
To figure out what needs to be included when archiving social media, NANETH has used significant properties, a strategy also used by the Library of Congress. In our guideline on social media archiving, we have assessed multiple techniques and scored them on various criteria. Among them are the five categories of significant properties: content, context, appearance, behaviour, and structure. In the next blog, we will provide more information about tools and techniques, and compare the different approaches of ANLux and NANETH.
Whether you’re a devoted fan of net fishing or fly fishing, the collection of government social media is an ever-evolving sport. Between web harvesting and partner-driven collection, our Luxembourgish and Dutch approaches explore different strategies to preserve these fragile yet essential digital traces. If this first blog post has helped define the stakes and scope, the game is only just beginning!
In our next blog post, we’ll dive into the technical depths of collection methods, results and statistics before dedicating a third post to the challenges of advocacy and integrating these practices into digital preservation strategies. Stay in the loop with fishers and keep the feed rolling!
By Helena Byrne, Curator of Web Archives, British Library
The 2024 Summer Olympics and Paralympics held in Paris were record breaking events. Like with previous Games since 2010, the International Internet Preservation Consortium (IIPC) Content Development Group organised a collaborative transnational Web archive collection on the Games. The events on and off the field of play from web publications from 86 countries. There are 47 languages represented in the collection. Not surprisingly the largest number of nominations were in French with 1,181 records while many languages have as few as 1 or 2 records.
The majority of these records were nominated by IIPC members but a small number of unique records were nominated through the public nomination form that was launched on the IIPC blog in July 2024.
As with our previous collection on the 2022 Winter Olympics and Paralympics, social media was excluded from this collection. This was due to the fact that it was very difficult to preserve any meaningful social media captures through the Archive-it platform at the time of the event. One change from the scoping rules for this collection compared to previous Olympic and Paralympic collections was to exclude the Seed page plus 1 click of all links on seed page (e.g. a single news page linking to multiple articles), because these types of crawls normally pick up lots of irrelevant content that eats up data.
All seeds added to the crawler were capped at 2mb. This is generally enough data to capture a standard website but would mean we only have shallow captures of bigger media heavy websites. Overall the 3,429 websites and webpages that were archived amounted to 458 GB and 6,315,815 documents.