Concurrent IMAP access with mbox format files

At some point, I would like to write an article about how much IMAP sucks, followed by creating a mail protocol that doesn't suck. But for now, I'm more concerned with implementation issues, and why they suck.

For whatever reason, there are people that hate NFS (Network File System). Even though it's a vastly popular and widely used filesystem, some people hate it. One of those people is Mark Crispin, creator of the IMAP protocol and author of the IMAP libraries viewed as the standard. This is unfortunate, because it has made life generally miserable for a large class of computer users through no fault of their own.

Mark has claimed, and it has been widely repeated, that it is impossible to provide concurrent access to mbox-format files with IMAP. In fact, if more than one client connects, the standard IMAP server closes the connection of the last client. There are reasons given for this, that the mbox format is a lousy format, and there are better formats. While these are both true, there's no reason why an IMAP implementation couldn't allow concurrent access. And somehow NFS is part of that decision too.

Of course, there's the mbx format, which does allow concurrent access, and really it isn't too hard to switch. Except for the little gotcha that the mbx format is not supported on NFS. And again, contrary to popular opinion, there is nothing inherently incompatible between NFS and mbx. But because this has been widely repeated by many people, it's been accepted.

Tom's First Law of Software Development

I'm going to let you in on a little secret. One of the things I've discovered as a software developer. A little gem of truth that people should be aware of:

When a software developer says something is impossible, they are almost certainly lying.

Now, we can mean to say many other things, some of which are useful. Unfortunately, the most likely thing we mean by this is "I don't feel like it". Or perhaps to be more generous "That sounds complicated". On a good day we mean "You can't afford that solution", which is a fair answer in the real world. In some cases, we might mean "There is not a known solution to that problem at this time." This is rare depending on context. It's a fair answer for pie-in-the-sky problems like conversational computers, but for most every day problems, it doesn't apply. And then, every once in a great while, we actually mean "that problem is computationally undecidable. That's an extremely rare answer.

In the case of IMAP though, the answers being offered about mbx over NFS or about concurrent access to mbox files are not the rare kind of answers. They are the most common kind of answers, the "I don't feel like it" answers.

Back to the original question

But getting back to the original question, lets do a simple thought experiment about IMAP servers. Or if you like, you can think of it as an implementation challenge, because this is really pretty easy.

The way the most common IMAP server implementations work is that you connect to a master IMAP server process, and then you authenticate. Then the server forks a new process just for you, which handles all of your IMAP needs. This is sort of your basic stone-simple server setup for all kinds of services, and it's where the (fairly trivial) problem starts. If you have another connection for the same user (and mailbox) from somewhere else, this other connection gets it's own process, and this is the conflict. You now have two processes potentially writing to the same file.

This is not an insurmountable problem to solve. It's one of the most common problems in computing to solve. Various file locking techniques can be used to insure concurrent access. And to be completely fair here, NFS suffers from low-probability locking issues that can result in rare file corruption. But again, knowing this issue, detecting and handling it isn't an insurmountable problem.

But this isn't even the major point of my thought experiment. Suppose we added this bit of pseudocode to the server logic:

if (this user is already logged in) {
  connect to existing process
} else {
  fork a new process
There we go folks, problem solved. There is now no concurrent access issue. Even if the user was actually using two different web applications at the same time to make changes (which is an extremely rare case to worry about), everything they do is now going through a single-threaded process and change can only happen one at a time.

Some of you might be shaking your heads, saying this isn't a very robust solution, and what if your email needs to be served by multiple servers, etc. But this is not the common case. And that problem is just as solvable as any other in computing. And it isn't clear that IMAP does such a great job of supporting that situation in any case.

And again, if you are thinking things like that about this solution, it's much like the programmer who says "impossible" but means "I don't feel like it". How much hassle has been caused worldwide by users dealing with these concurrent access issues, and all because a programmer didn't feel like it? How much financial cost has been incurred buying physical hardware to solve a "problem" that could have been avoided if the programmer had felt like it?

Software development is not an art. It's not a hobby. It's a professional career. The goal is to serve the needs of the customer, not our personal computing aesthetic, nor our basic laziness.

So my challenge to you is to go out there and implement this. Fame and fortune can't help but find you if you do.

Reader Comments (Experimental. Moderated, expect delays. Posts may be edited or ignored. I reserve the right to remove any or all comments, at any time.)

No comments

Add a comment

Tom Fine's Home Send Me Email