Pondering a Query Language for Mail

Query languages are typically used for databases. The basic formula of a query follows subject-predicate form: which entries are targeted and what to do with them.

The simplest form is “print (or return) all entries in this database.”

When you’re dealing with a single piece of mail, you don’t really need a query language. It’s just a matter of directly applying predicates to it.

But when you want to do batch operations, it seems like a query language would be much more useful.

Thankfully, there’s always python. With a few lines of this language you can do the equivalent to what you would with a query language.

Doing something like (untested):

import mailbox
import datetime

# factory to None to avoid it being an rfc822.Message
md = mailbox.Maildir('~/Mail', factory=None)
for key, mail in md.iteritems():
    if not mail['Subject']:
        date = datetime.datetime.fromtimestamp(mail.get_date())
        mail['Subject'] = "[Received on {0}]".format(date)
        md[key] = mail

This will set any mail missing a subject to have it say when it was received as the subject. That’s probably not very useful, but for other things it might be.

But it’s still a bit complicated, in that you need to know python and if you need to operate on several folders, it gets more complicated.

Also, given the number of systems mail might have passed through, all the different filters and handlers, each which may tack on its own special headers, mail is rather messy. Add to that the fact that you may need to leave them untouched for reasons of regulatory compliance and/or security, and the medium becomes a great joy to deal with.

My guess is that separating e-mail into the usable copy and the canonical copy is the best strategy. Every mail gets thrown in a read-only archive that’s untouched and only accessed to pull out of, and another copy goes to the user where they can mangle it as much as they please.

I’ll also take issue with the interfaces used for mail. There has to be a better way to build an interface for cases where people have to manually sort through many items.

A few thoughts on that:

  • Don’t necessarily show the user the whole mail as one piece. By breaking up subject, from/to, body, etc. and showing these independently, it may allow the user to make better judgments. They don’t have to look at the subject and the from/to, two types of data, in rapid succession.
  • Reshow the mail several times. By having it appear more than once, the user has less pressure to get it right in one go. In a normal interface, if they misfile the mail, it will be permanently misfiled. If they know they will see it a few more times, they can give their best guess and the system can sequester mail that has conflicting filings over multiple passes.
  • Show progress. The user seeing that they are on page X of Y has some idea of progress, but most systems, they’ll stay on the first page and the Y will diminish. They won’t have as clear an idea of how much they’ve accomplished.
  • Let modes be modes. While some activities eschew modal uses, others thrive as modes. Manual sorting is such an activity. Instead of trying to have the same interface function for both browsing and sorting, a dedicated sorting mode is perfectly fine.

When used with a query language, such a sorting activity might just be to weed out any false positives. Other times, the query language would be used to feed into the sorting, with sorting being the predicate.

It’s important to recognize that the use for e-mail can apply equally to any sequence of complex artifacts. It could be news, search results (web pages, library books, etc.). Currently, you get a large set of ordered results from services like Google, which you have little ability to easily cull out of. While the engineers work hard to guarantee the relevance of items, often a further search within the set would be easier left to the user when equipped with more advanced tools.