Saving NetNews with Tcl(lib)

Published , updated

Given my various interests I am following several groups like <news:comp.lang.tcl> and <news:comp.risks> on NetNews, a global bulletin board system which was started shortly after the internet itself.

Due to the ephemeral nature of the various boards' contents, with most servers keeping messages for only a week or two, any access to older messages means that I either have go to some website which backs them up, like Google Groups, or save them on my own.

Here I describe how to do the latter, using Tcl and Tcllib.

We will need access to Tcllib's sources even if it is already installed from your favorite distribution's repositories. This is because we will be using the two scripts pullnews and dirstore found under examples/nntp to accomplish our task, and I know of no distribution that installs the Tcllib examples.

Edit: Stuart Cassoff tells me that OpenBSD does install the examples, since 2008.

Next, we need an account, i.e., a user name and a password, with a host serving NetNews via NNTP. If your ISP does not provide one then you have to use one of several specialized providers, like Eternal September.

With that done below are my script

#!/bin/sh
#--
GROUP=comp.lang.tcl
#--
BASE=$HOME/Projects/Backups/News
ACCOUNT=$BASE/etc/eternal-september.org
SERVER=news.eternal-september.org
SAVETO=$BASE/archive/$GROUP
BINDIR=$BASE/bin

$BINDIR/pullnews -via $ACCOUNT $SERVER $GROUP \
$BINDIR/dirstore $SAVETO

and its account file:

the-user-name
the-user-password
(additional optional lines ignored by pullnews)

Well, not quite. My actual paths are slightly different, I am not telling anybody my account information, and the group name is an argument. Making the equivalent changes is left as an exercise for the reader.

Some explanations and notes are now likely in order:

  • As written both pullnews and dirstore were copied into the chosen structure.

    They could also be copied into a path listed on the PATH or the PATH could be extended to include the directory they reside in. Either way would allow their use in the script without needing an absolute path.

  • My host (Shaw, Eternal September as well) requires an account, and thus the

    -via $ACCOUNT

    in the script. If the actual host is fully open, without the need for any account, then this part of the script has to be removed.

  • I have added an entry to my crontab which runs the script several times a day (actually several times per hour). This ensures that all new(s) articles of the group incrementally accumulate in the backup directory as they arrive.

  • It is, however, a good idea to make an initial manual run of the script to pull in the saved backlog from the host as that may take a long time (depending on how much it keeps). Eternal September, for example, has a backlog spanning several years.

  • Do not forget to create the directory mentioned in SAVETO before the first run. The dirstore script will not create it and will bail out with an error if the directory is missing.

Now we have a functioning backup, although our storage system is quite simple - just a directory.

If we want to use a storage system that supports more features, like an index, searching, etc., we have to look under the hood of pullnews a bit to see how it talks to the dirstore.

The relevant procedure is store_cmd, which encapsulates the builtin exec. It is called twice:

  1. 
    This call queries the store for the sequence number of the last
    stored article, expecting it on stdout. If the result is empty
    ```pullnews``` will use the sequence number of the oldest article
    known to the host instead.
    
    This is how it pulls the entire backlog on its first run and only
    the new articles on all subsequent runs.
    
    
  2. 
    This saves a retrieved article into the store, with the specified
    sequence number. The article data is presented to the store on
    stdin.
    
    

Not very complicated. Any storage command which follows this simple API can be used as a backend of pullnews.

Happy Tcling.