Wasteback Machine is a JavaScript library for measuring the size and composition of archived web pages (mementos) from the Internet Archive's Wayback Machine.
Wasteback Machine retrieves mementos with high fidelity, removing archive linkage and replay-preserving modifications, and excluding replay-induced distortions, while preserving temporal coherence. The library extracts and classifies binary resources (URI-Ms) to accurately measure page size and composition.
The method overcomes the limitations of live-measurement approaches by recognising the unique nature of web archives as re-born digital objects and navigating their complexities to make them analytically tractable. This enables retrospective analysis of websites.
Its modular design supports integration into research workflows, analytics pipelines, and sustainability assessment tools, facilitating the study of web evolution and informing interventions to measure the internet’s environmental impact.
- Retrieve mementos by date or timespan: Selects the nearest memento if the exact timestamp is missing.
- Analyse page composition: Measure sizes of HTML, style sheets, scripts, images, videos, fonts, etc.
- Generate detailed resource (URI-M) lists: Includes URLs, types, and sizes of all URI-Ms.
- Retrieval completeness score: See what percentage of a memento was successfully retrieved.
To install Wasteback Machine as a dependency for your projects using NPM:
npm install @overbrowsing/wasteback-machineTo install Wasteback Machine as a dependency for your projects using Yarn:
yarn add @overbrowsing/wasteback-machineWasteback Machine provides two primary functions:
- Discover available mementos for a URL in a given time range.
- Analyse a specific memento for page size and composition.
import { getMementos } from "@overbrowsing/wasteback-machine";
// Get all mementos for www.nytimes.com between 1996 and 2025
const mementos = await getMementos('https://nytimes.com', 1996, 2025);
console.log(mementos);Example Output:
[
'19961112181513', '19961112181513', '19961112181513', '19961219002950', ...
]import { getMementoSizes } from "@overbrowsing/wasteback-machine";
// Analyse www.nytimes.com memento from November 12, 1996
const mementoData = await getMementoSizes(
'https://nytimes.com',
'19961112181513',
{ includeResources: true } // optional: include full resource list
);
console.log(mementoData);Example Output:
{
url: 'https://nytimes.com',
requestedMemento: '19961112181513',
memento: '19961112181513',
mementoURL: 'https://web.archive.org/web/19961112181513/https://nytimes.com',
sizes: {
html: { bytes: 1653, count: 1 },
stylesheet: { bytes: 0, count: 0 },
script: { bytes: 0, count: 0 },
image: { bytes: 46226, count: 2 },
video: { bytes: 0, count: 0 },
audio: { bytes: 0, count: 0 },
font: { bytes: 0, count: 0 },
flash: { bytes: 0, count: 0 },
plugin: { bytes: 0, count: 0 },
data: { bytes: 0, count: 0 },
other: { bytes: 0, count: 0 },
total: { bytes: 47879, count: 3 }
},
completeness: '100%',
resources: [
{
url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/index.gif',
type: 'image',
size: 45259
},
{
url: 'https://web.archive.org/web/19961112181513im_/http://www.nytimes.com/free-images/marker.gif',
type: 'image',
size: 967
}
]
}A demo is available in examples/demo.js. It integrates CO2.js with the 1Byte model to estimate the environmental impact of a memento.
Run the demo with Node.js:
node examples/demo.js <URL> <Year YYYY> [Month MM] [Day DD]Parameters: • : Target website to analyse • : Year of interest • [Month MM]: Optional month (defaults to January (01) if omitted) • [Day DD]: Optional day (defaults to 1st (01) if omitted)
Example:
# Analyse www.nytimes.com memento from November 12, 1996
node examples/demo.js www.nytimes.com 1996 11 12After running the demo, you will receive a structured report for the desired memento:
- Memento information:
- Retrieved memento URL
- Completeness of retrieval (%)
- Page size results:
- Total page size (KB)
- Estimated equivalent emissions per page visit (g CO₂e)
- Page composition results:
- Count of URI-Ms by type (images, scripts, stylesheets, etc.)
- Total size per type (KB) and percentage of total page size (%)
- Estimated equivalent emissions per type per page visit (g CO₂e)
Example Output:
# Results for www.nytimes.com memento from November 12, 1996
Retrieved Memento:
🔗 Memento URL: https://web.archive.org/web/19961112181513/https://www.nytimes.com
✅ Completeness: 100%
Page Size Results:
📊 Data Transfer: 46.76 KB
🌍 Page CO₂e: 0.014 g
Page Composition Results:
📁 HTML
Count: 1
Size: 1.61 KB (3.5%)
CO₂e: 0.000 g
📁 IMAGE
Count: 2
Size: 45.14 KB (96.5%)
CO₂e: 0.013 gFor details on Wasteback Machine’s methodology, assumptions, and limitations, please refer to our working paper. It provides guidance on the library’s intended use, interpretive constraints, and best practices for integrating results into research or sustainability assessments.
For questions or access before publication, please contact [email protected].
Important
This library is provided for informational and research purposes only. The authors make no guarantees about the accuracy of the results and disclaim any liability for their use.
Contributions are welcome! Please submit an issue or a pull request.
Wasteback Machine is licensed under Apache 2.0. For full licensing details, see the LICENSE file.
The Wayback Machine, Wayback CDX Server API, and Wayback Replay API are provided by the Internet Archive and are governed by their Terms of Use.