• Ilya Kreymer
Over the years, the Webrecorder project has developed a lot of tools to make web archiving easier and accessible for all. To continue pushing the boundaries of high-fidelity web archiving and make tools that are easy to use and easy to maintain, it is sometimes necessary to discontinue older tools and focus on new ones.
If you are currently using the following tools, we recommend transitioning to the newer tools mentioned below.
-
If you’re using Webrecorder Desktop, you should switch to the ArchiveWeb.page Extension or Desktop App. See below for more details on ArchiveWeb.page. Webrecorder Desktop development has been discontinued.
-
If you’re using Browsertrix, you should switch to Browsertrix Crawler, a more modular, self-contained crawler. See below for more details on Browsertrix Crawler
-
If you’re using Webrecorder Player, you should switch to ReplayWeb.page App or use the https://replayweb.page web site. ReplayWeb.page was released last year, and Webrecorder Player development has been discontinued last year.
ArchiveWeb.page Desktop App Now Available
Last month, we announced the release of the ArchiveWeb.page Chrome Extension
During our last community call, we also announced initial beta release of the ArchiveWeb.page App which complements the extension.
The desktop app uses the same code base as the extension and updates will be released to both at around the same time.
Extension vs App
The extension is preferrable to many use cases, as it integrates directly with the browser and may be easier to start recording. When using the extension in their existing Chromium-based browser, users can archive exactly what they see, including all sites they’re already logged into.
The app may be useful in cases where the extension has difficulty, particularly due to certain restrictions in the browser. For example, in Chrome, many Google sites have native apps, and security settings may prevent archiving Google Docs, etc.. Archiving these sites should work in the standalone app.
The extension does require a Chromium-based browser (Chrome, Brave, Edge), so the app may be an alternative for those who do not wish to install one of these browsers.
Users familiar or have existing workflow with Webrecorder Desktop should find the ArchiveWeb.page App easy to use.
Webrecorder is committed to making it as easy as possible to archive any site, and will continue to offer ArchiveWeb.page as both an app and an extension.
Deprecation of Webrecorder Desktop
With the release of ArchiveWeb.page App and extension, the existing Webrecorder Desktop app is now deprecated.
The Webrecorder Desktop was developed by migrating a system designed to be a cloud-based service into an app, and resulted in an overly complex architecture that made it difficult to maintain. While the app was based on Electron, it also bundled two separate native executables, a Python App and an external Redis binary, which made it very hard to keep up-to-date for latest MacOs and Windows releases.
The ArchiveWeb.page app and extension are designed from the ground up to run as local archiving systems on your machine.
If you are starting a new archive, please use ArchiveWeb.page
If you have existing collections in Webrecorder Desktop, you can export them as WARC files and view via ReplayWeb.page.
ArchiveWeb.page app and extension will both have a way to import WARC files from Webrecorder Desktop in and upcoming update.
We plan to release more instructions for how to migrate in the near future!
Crawling Tools Update: Refactoring Browsertrix into the new Browsertrix Crawler
With the release of the modular Browsertrix Crawler crawling system, the older, all-in-one Browsertrix is no longer being developed in favor of Browsertrix Crawler. The original system had too many ‘moving parts’: a crawler, a remote browser system, behavior system, a scheduler, a UI and a CLI tool, all split across many Docker containers and repos.
All of those are important, but it became difficult to maintain all of the components as designed. The idea of Browsertrix lives on in a more modular setup with Browsertrix Crawler, which focuses on the core use case of being able to run an automated high-fidelity crawl of small or medium-size site.
Additional features, such as a scheduler or a UI may be added in the future, but will be separate from the Browsertrix Crawler. Above all, we want the core Browsertrix Crawler to be easy to use and focus on providing high-fidelity crawling via a single command.
See the Browsertrix Crawler repository issues for more details on current development of the crawler.
Have thoughts or comments on this post? Feel free to start a discussion on our forum!