• Ilya Kreymer
Introducing ArchiveWeb.page Chrome Extension
I am excited to announce the launch of ArchiveWeb.page, a brand-new high-fidelity web archiving system available as a Chrome extension from the Chrome Web Store
The extension has been tested in latest versions of Chrome, as well as with the Edge and Brave browsers.
In classic Webrecorder style, the extension allows users to ‘record’ highly interactive websites, including social media, video, customized content, and even local intranet content.
When the original webrecorder.io was launched nearly six years ago, the goal was to allow users to record/capture exactly what is loaded in their browser. At the time, it was not possible to do entirely with a browser extension and an outside proxy server (running on webrecorder.io) was necessary. Now, thanks to evolution of the browser technologies, this original vision of archiving entirely in your browser can finally be realized!
The ArchiveWeb.page extension turns the browser into a full web archiving system, allow users to turn ‘recording’ mode on any tab, which will then capture/record all the elements of a page exactly as they are loaded. The archived data is then stored in the browser itself, and can be replayed/accessed even when offline. ArchiveWeb.page builds on and complements the ReplayWeb.page system, announced last year.
User Guide
To get users started with the extension right away, we’ve also launched a detailed User Guide, created by our Community Manager, Lorena Ramírez-López.
Read on below for an overview of some key features in ArchiveWeb.page.
Archiving Flash with Ruffle Emulator
ArchiveWeb.page embeds the Ruffle emulator, allowing users to archive and replay Flash-based works. Ruffle is automatically enabled on pages that have Flash.
Not all Flash pages will work with Ruffle, but many will. See our on-going efforts to ensure Flash remains accessible.
Page-oriented Archiving and Deduplication
In ArchiveWeb.page, the smallest unit is the page. The extension archive keeps track of which resources are loaded from which page. This allows for individual pages to be downloaded, and deleted, as necessary, and will help ensure archived pages are accurately replayed. Resources shared across multiple pages are automatically deduplicated to save storage.
This is a bit different than in Webrecorder Desktop, where the smallest unit was a session and individual pages could not be deleted or separated. Support for removing individual pages was an oft-requested feature, and this is now available in ArchiveWeb.page.
Full-Text for Web Pages and PDFs
ArchiveWeb.page includes built-in full-text search support. When recording a page, the text of the page is automatically extracted and indexed (when the page is first loaded and again when leaving the page). Text for any PDFs recorded is also extracted.
When replaying pages, enter text queries in the location bar to search pages by text.
Download Archives in WACZ or WARC
The extension fully supports exporting entire web archive collections or individual pages in the new WACZ Format 1.0. This format, which contains WARCs, indices and other data, makes it easy to share web archives and load them quickly using ReplayWeb.page.
See our blog post on the WACZ format.
Of course, the extension also supports downloading as plain WARC files as well.
Peer-to-peer sharing using IPFS
The ArchiveWeb.page extension includes experimental peer-to-peer sharing of web archives, using IPFS. This feeatures allows users to share a web archive collection from directly from their browser!
ReplayWeb.page has been updated to support loading web archives directly from IPFS, allowing shared archives from the extension to be quickly shared with others, without having to download and send full WACZ files.
This feature is still experimental, see the guide page on sharing for some caveats.
Video
Here’s a brief video of the ArchiveWeb.page extension being used to archive a Twitter feed, including video, archive a MOMA exhibition page with Flash, replay each page, search by text, and then download selected pages in WACZ format:
Further Work / Coming Soon
This is only the initial release of ArchiveWeb.page. Here’s some additional work that is in the pipeline for future improvements.
ArchiveWeb.page Desktop App
For those that may prefer a standalone desktop app instead of an extension, we’re also working on an ArchiveWeb.page desktop app.
This app will shares the same system as the extension, but will run as a standalone desktop app. The ArchiveWeb.page App will replace the existing Webrecorder Desktop app, and we hope to offer a migration path to the new app once its available. Stay tuned for more details.
A development version can be built locally using the ArchiveWeb.page GitHub repository
Autopilot System
The autopilot system from Webrecorder Desktop, which runs automated behaviors on certain sites is not yet in this version of the extension, but rest assured that we plan to add this system to ArchiveWeb.page, both extension and app, in an upcoming release. We’ll be sure to make an announcement once it is ready!
Feedback
Try out archiveweb.page, read the guide and let us know if you have any feedback on this new tool! We want to hear from you!
You can reach out via the forum or attend our upcoming community call.
Have thoughts or comments on this post? Feel free to start a discussion on our forum!