Webrecorder
Web archiving for all!

Browsertrix is the high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all! Archive whole websites with automated crawling, analyze them with our assistive QA tools, and combine them with previously captured uploaded content to share with others.

Read more and sign up to use Browsertrix
View Browsertrix Documentation
An animated gif of crawling a website with Browsertrix. The user watches as the crawler visits multiple webpages.

ArchiveWeb.page is a Chrome extension and standalone desktop app that allows you to archive websites interactively as you browse. The extension works in any Chromium based browser (Chrome, Brave, Edge) and the desktop app provides the same interactive high-fidelity archiving functionality as a standalone application.

See the documentation for more info.

Download the ArchiveWeb.page Chrome Extension
Download the ArchiveWeb.page Desktop App
View ArchiveWeb.page Documentation
An animated gif of a user downloading a WACZ web archive to their computer.

ReplayWeb.page provides an embeddable web archive viewer for WARC and WACZ web archives hosted on the web, your local computer, or Google Drive. ReplayWeb.page is also available as a PWA or desktop application for offline use.

See the documentation for more info.

Open ReplayWeb.page
Download the ReplayWeb.page Desktop App
View ReplayWeb.page Documentation
An animated gif of a user loading a file into ReplayWebpage and viewing it

pywb toolkit is a full-featured, advanced web archiving capture and replay framework for Python. It provides command-line tools and an extensible framework for high-fidelity web archive access and capture, including localization and access control. A subset of features provides the basic functionality also known as a 'wayback machine', but pywb includes additional features to create new web archives and to manage existing collections.

View pywb Documentation
An animated gif of a user scrolling PYWB's documentation

Browsertrix Crawler is a simplified browser-based high-fidelity crawling system, designed to run a single crawl in a single Docker container.

Browsertrix Crawler requires familiarity with a command-line and Docker to run crawls.

View Browsertrix Crawler Documentation
A command line screencast of Browsertrix Crawler running in a terminal.

All Tools

In addition to the above key tools, we maintain a numerous other smaller tools as part of the web archiving ecosystem. Select one of the categories to further filter this list. Take a look at these tools if you are interested in deploying web archiving tools on your, or integrating into other projects.


All currently maintained Webrecorder tools are listed below. Select one of the categories to further filter this list.




archiveweb.page

A Chrome extension and desktop app for capturing and replaying pages directly using a browser

browsertrix-behaviors

A set of automated behaviors for automating interactions with the browser, including generic (playing video, scrolling) and site-specific behaviors, such as for social media

browsertrix-crawler

A self-contained crawling system that runs a high-fidelity crawl in a single Docker container

oldweb.today

An integrated browser emulation system for running in-browser emulators connected to web archives

pywb

The core web archive toolkit, includes web archive replay, access and collection management

pywb-remote-browsers

Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives

remote-desktop-server

A set of Docker contains for VNC and WebRTC streaming. A component for pywb-remote-browsers.

replayweb.page

A serverless web and desktop app for viewing web archives directly in the browser

shepherd

A system Docker containiner orchestration system for launch 'flocks' on Docker contains on-demand. Part of the Remote Browser system.

shepherd-client

A JS frontend for embedding remote browsers in Conifer. Part of the Remote Browser system

wabac.js

A service-worker based web archive replay system. Backend for ReplayWeb.page

wacz-format

A new specification for a portable Web Archive Collection Zip (WACZ) format and python library

warcio

A fast, standalone way to read and write WARC Format commonly used in web archives

warcio.js

A port of python warcio to Javascript. Supports reading/writing WARC files in the browser and in Node.

warcit

A command-line tool to convert on-disk directories of web documents (commonly HTML, web assets and any other data files) into an ISO standard web archive (WARC) files.

wombat.js

The client-side rewriting Javascript rewriting system used in pywb and wabac.js