Given my various interests I am following several groups like <news:comp.lang.tcl> and <news:comp.risks> on NetNews, a global bulletin board system which was started shortly after the internet itself.
Due to the ephemeral nature of the various boards' contents, with most servers keeping messages for only a week or two, any access to older messages means that I either have go to some website which backs them up, like Google Groups, or save them on my own.
Here I describe how to do the latter, using Tcl and Tcllib.
We will need access to Tcllib's sources even if it is already
installed from your favorite distribution's repositories. This is
because we will be using the two scripts pullnews
and
dirstore
found under examples/nntp to accomplish our task,
and I know of no distribution that installs the Tcllib examples.
Edit: Stuart Cassoff tells me that OpenBSD does install the examples, since 2008.
Next, we need an account, i.e., a user name and a password, with a host serving NetNews via NNTP. If your ISP does not provide one then you have to use one of several specialized providers, like Eternal September.
With that done below are my script
#!/bin/sh
#--
GROUP=comp.lang.tcl
#--
BASE=$HOME/Projects/Backups/News
ACCOUNT=$BASE/etc/eternal-september.org
SERVER=news.eternal-september.org
SAVETO=$BASE/archive/$GROUP
BINDIR=$BASE/bin
$BINDIR/pullnews -via $ACCOUNT $SERVER $GROUP \
$BINDIR/dirstore $SAVETO
and its account file:
the-user-name
the-user-password
(additional optional lines ignored by pullnews)
Well, not quite. My actual paths are slightly different, I am not telling anybody my account information, and the group name is an argument. Making the equivalent changes is left as an exercise for the reader.
Some explanations and notes are now likely in order:
As written both pullnews
and dirstore
were copied into
the chosen structure.
They could also be copied into a path listed on the PATH or the PATH could be extended to include the directory they reside in. Either way would allow their use in the script without needing an absolute path.
My host (Shaw, Eternal September as well) requires an account, and thus the
-via $ACCOUNT
in the script. If the actual host is fully open, without the need for any account, then this part of the script has to be removed.
I have added an entry to my crontab which runs the script several times a day (actually several times per hour). This ensures that all new(s) articles of the group incrementally accumulate in the backup directory as they arrive.
It is, however, a good idea to make an initial manual run of the script to pull in the saved backlog from the host as that may take a long time (depending on how much it keeps). Eternal September, for example, has a backlog spanning several years.
Do not forget to create the directory mentioned in SAVETO
before the first run. The dirstore
script will not
create it and will bail out with an error if the directory is
missing.
Now we have a functioning backup, although our storage system is quite simple - just a directory.
If we want to use a storage system that supports more features, like
an index, searching, etc., we have to look under the hood of
pullnews
a bit to see how it talks to the dirstore
.
The relevant procedure is store_cmd
, which encapsulates the
builtin exec
. It is called twice:
This call queries the store for the sequence number of the last
stored article, expecting it on stdout. If the result is empty
```pullnews``` will use the sequence number of the oldest article
known to the host instead.
This is how it pulls the entire backlog on its first run and only
the new articles on all subsequent runs.
This saves a retrieved article into the store, with the specified
sequence number. The article data is presented to the store on
stdin.
Not very complicated. Any storage command which follows this simple
API can be used as a backend of pullnews
.
Happy Tcling.