mwoffliner is a tool which allows to make a local HTML snapshot of
any online (recent) Mediawiki instance. It goes through all articles
(or a selection if specified) and write the HTML/pictures to a local
directory. It has mainly been tested against Wikimedia projects like
Wikipedia, Wiktionary, ... But it should also work for any recent
Mediawiki.
To use mwoffliner, you need a recent version of Node.js and a POSIX
system (like GNU/Linux). But there are also a few other dependencies
described below.
Most of the instructions are given for a Debian based OS.
Install first Node.js
$curl -sL https://deb.nodesource.com/setup_6.x | sudo -E bash -
$sudo apt-get install -y nodejs
mwoffliner makes some treatments on downloaded images, so the
following binaries are required : jpegoptim, advdef, gifsicle, pngquant, imagemagick.
$sudo apt-get install jpegoptim advancecomp gifsicle pngquant imagemagick
FIXME: These instructions are insufficient to build zimwriterfs Please follow instructions in https://github.com/openzim/zimwriterfs/blob/master/docker/Dockerfile
mwoffliner is thought to write the snapshots in the ZIM archive file
format. See http://www.openzim.org/ for more details.
$sudo apt-get install liblzma-dev libmagic-dev zlib1g-dev libgumbo-dev libzim-dev libicu-dev
$git clone https://github.com/openzim/zimwriterfs.git
$cd zimwriterfs
$./autogen.sh
$./configure
$make
$sudo make install
Installation can be processed by following official installation documentation : https://raw.githubusercontent.com/openzim/zimwriterfs/master/README.md
Redis a software daemon to store huge quantity of key=value pairs. It is
used as a cache by mwoffliner.
You can install it from the source:
$wget http://download.redis.io/releases/redis-3.2.8.tar.gz
$tar xzf redis-3.2.8.tar.gz
$cd redis-3.2.8
$make
or directly from the repository:
$sudo apt-get install redis-server
Here are the important parts of the configuration (/etc/redis/redis.conf):
unixsocket /dev/shm/redis.sock
unixsocketperm 777
save ""
appendfsync no
We also recommend to use a DNS cache like nscd.
Then install mwoffliner and its dependencies itself:
$sudo npm -g install mwoffliner
or if you do not want to install it as root:
$npm install mwoffliner
When you are done with the installation, you can start mwoffliner. There are two ways to use mwoffliner.
If installed as root (so in the $PATH):
mwoffliner
otherwise:
node ./node_modules/mwoffliner/bin/mwoffliner.script.js
This will show the usage() of the command.
If you want to run mwoffliner the npm way, you must create some
npm scripts through package.json definition. Add, for example, the
following scripts part in your package.json:
"scripts": {
"mwoffliner": "mwoffliner",
"create_archive": "mwoffliner --mwUrl=https://en.wikipedia.org/ [email protected]",
"create_mywiki_archive": "mwoffliner --mwUrl=https://my.wiki.url/ [email protected]"
}
Now you are able to run mwoffliner through npm:
$npm run mwoffliner -- --mwUrl=https://en.wikipedia.org/ [email protected]
The first "--" is meant to pass the following arguments to
mwoffliner module.
Include this script to the .js file of your project:
const mwoffliner = require('./lib/mwoffliner.lib.js')
const parameters = {
mwUrl: 'https://en.wikipedia.org/',
adminEmail: '[email protected]'
}
mwoffliner.execute(parameters)