Download JMail 1.1 / Mail me, jyelon@uiuc.edu / Back to my Software Page
JMail's primary function is to archive a mailing list and export that mailing list to the world-wide web. However, it can be used for many mail-archiving needs, or even for some mail handling needs.
JMail is a single program with many functions. These separate functions can be assembled together into a complete mail-handling system.
To demonstrate the abilities of JMail, here is a JMail archive in my account. It contains messages from the ars-magica@csua.berkeley.edu mailing list. It is active: messages are added continuously, as they arrive in my mailbox. This functionality is performed without my intervention.
JMail's first argument is always a word selecting a function to execute. The available functions are listed below.
Usage: jmail arc-build archive < mailfile
This function converts messages in UNIX mail file format into a JMail archive. JMail archives use only moderate disk space, yet they support highly efficient retrieval of messages by message-number, message-date, or by word search. This makes them an ideal format for long-term storage of messages.
If archive already exists, the messages are appended to the archive. If it does not already exist, it is created.
Warning: if you invoke jmail (or any mail-archiving software) from a UNIX forward file, you need to be aware of locking issues. See jmail claim-lockfile below.
Usage: jmail arc-dump archive > mailfile
This function converts a JMail archive back into a UNIX mailbox file.
Usage: jmail arc-lookup-text archive < commandlist > messages
This function retrieves selected messages from a JMail archive. The messages are retrieved in UNIX mailbox format. The retrieved messages are sent to standard output.
Messages are selected based on the retrieval commands read from standard input. Each retrieval command is a single letter followed by a set of parameters. The currently supported retrieval commands are:
Several notes. 1. dates are in seconds, as returned by the UNIX time function. 2. a single integer on a line by itself is also a retrieval command, retrieving the specified message number. 3. To perform a word-search, the jmail archive must be indexified, see jmail arc-indexify below.
Usage: jmail arc-lookup-html archive < commandlist > messages
This function retrieves selected messages from a JMail archive. The messages are retrieved in HTML format. Typically, this option would be used inside a CGI script, making it possible to export the contents of a JMail archive to the world-wide-web.
Messages are selected based on the retrieval commands read from standard input. Retrieval commands are described above, under jmail arc-lookup-text.
Usage: jmail arc-lookup-text-index archive < commandlist > messages
This function retrieves descriptions of selected messages from a JMail archive. The descriptions are printed on standard output. Each description is a comma-separated list with these fields:
There may be comments interspersed with the descriptions. Each comment is a line beginning with '#'.
Messages are selected based on the retrieval commands read from standard input. Retrieval commands are described above, under jmail arc-lookup-text.
Usage: jmail arc-lookup-html-index archive < commandlist > messages
This function retrieves descriptions of selected messages from a JMail archive. The descriptions are in a tabular HTML format, suitable for inclusion on a web-page. Typically, this function would be used inside a CGI script as a means to export a JMail archive to the Web. The HTML contains the word SCRIPT on each line, and this word should be replaced with the URL of the CGI script.
Messages are selected based on the retrieval commands read from standard input. Retrieval commands are described above, under jmail arc-lookup-text.
Usage: jmail arc-indexify archive threshold minoccur
To perform word-searches on a JMail archive, the archive must be indexified. Indexification is the process of building a big table indicating which words occurred in which messages. To reduce the size of the table, the indexification software discards two kinds of words: 1. Words that are extremely common, like the. 2. Words that are extremely rare, and therefore, probably misspelled. The parameter threshold indicates the percentage of messages a word must occur in before it is considered too common. A reasonable value for threshold would be 40, indicating that words occurring in more than 40% of the messages should be ignored. The other parameter, minoccur, indicates the number of times a word must occur in order to be included. A reasonable value for minoccur is 2, indicating that a word must occur in at least 2 messages to be included in the index.
After indexifying an archive, all messages in the archive can be retrieved via word search. If you then subsequently add messages to the archive using arc-build, those new messages are not available for word-search (a word search would only retrieve messages that were already in the archive at the time of indexification). However, since indexification is slow (maybe 5 minutes for 25000 messages), it isn't feasible to indexify after each message addition. We therefore suggest indexifying an archive about once per day.
While indexifying, jmail shows you how many messages it has finished by printing msg #100, msg #200, msg #300, and so forth. When using jmail indexify in a cron-job, it is probably desirable to redirect this status report to /dev/null.
Usage: jmail claim-lockfile filename
It is possible to set up a UNIX forward file to cause messages to automatically be received, processed, and added to a jmail archive. However, there is a small problem: the UNIX mail handling software does not wait for one message to be delivered before it tries to deliver the next one. Therefore, it is possible to accidentally try to concurrently add two messages to a single archive at the same time, corrupting the archive. To avoid this, you must use locking.
JMail claim-lockfile is a simple means to achieve locking in a shell-script. JMail claim-lockfile creates the specified lockfile. If the lockfile is already there, jmail waits for it to be removed before creating it. Therefore, to achieve mutual exclusion in a shell-script, simply use this sequence:
jmail claim-lockfile LOCK perform file manipulations that must be exclusive rm -f LOCK
Usage: jmail unmangle < pseudo-mailfile > true-mailfile
JMail unmangle is used when your mail has previously been archived in some vaguely mailbox-like format. JMail uses some simple heuristics to identify message boundaries. It then converts the messages it has identified into genuine UNIX mailbox format, and sends them to standard output. The heuristics used to identify message boundaries are:
In particular, these heuristics were designed to split apart the mail-digests created by ``majordomo''. They work equally well on simple UNIX mail-files which have been ``stripped down'' for the sake of saving disk space, and also on the files created by ``mh''.
Usage: jmail keep-headers header1 header2... < mailfile > stripped-mailfile
This function is used to remove unwanted RFC headers from a mail-file. You specify a list of the headers you want to keep, and jmail removes the rest. Both the input and output are in UNIX mailbox format.
Usage: jmail divide defaultfile string1 file1 string2 file2 ... < mailmsgs
This function divides a UNIX mailbox file into several components. Each message is read from the standard input. If the message contains string1, it is added to file1. If the message contains string2, it is added to file2, and so forth. If the message contains none of the specified strings, it is added to the defaultfile.
Admittedly, this function needs to be more powerful. I'm working on it.
Download JMail 1.1 / Mail me, jyelon@uiuc.edu / Back to my Software Page