December, 2008

Cron, Network Users, and Launchd

When our users try to set up cron jobs (using crontab -e), they seem to work fine, until the system reboots, at which point the cron jobs stop running. If the user re-edits their cronfile, the cron job will start working again.

The problem is that cron is starting up before NIS is running. When cron starts, it looks at all the files in /usr/lib/cron/tabs (which is actually /var/at/tabs), and loops up the each filename to see if it is the name of a user. If it is, then cron remembers it as a job, but if it is not, cron just skips over it.

Cron will rescan the directory when it changes (e.g. when a user edits their crontab), and at this point if NIS is running, then cron will find all of the jobs it might have lost.

As far as I can tell, there is no longer any way in launchd to tell cron that it shouldn't start running until after directory services are running. Because of this, I suspect there also may be a problem here for LDAP/Open Directory users (and not just NIS), though I haven't tested myself.

Some might be thinking at this point, well why don't you just use launchd? The whole point after all was for launchd to replace cron. Well, if you take the view that cron's only purpose is system automization, then maybe that's ok. But in fact, cron is also an end-user program, which is now broken.

At any rate, I don't even think this will work. User LaunchAgents live in their home directories. And we have network-mounted home directories. Does this mean that my jobs would only run if I was logged in? This is the same problem all over again. You could always "fix" launchd to mount all network homes, and look for jobs, but this would then mean my job would run on every machine where my home directory could be mounted, never mind the ridiculous waste of network resources searching for jobs. And launchd jobs are much harder for an end-user to set up than cron jobs too. Even if launchd could be fixed to somehow properly handle user jobs in a resource friendly way, my users would still prefer cron for most things.

Workaround

I don't have any fix. For a workaround, I'm working on a script that will run at boot time, and will update the crontab directory once it sees that NIS is up. Ugly stuff.

All Bow to Launchd

The real problem here is launchd. The designers of launchd are very proud of how strictly pedantic they have been in writing launchd. They seem to be very proud that there is in fact no way of setting up the needed dependency in launchd. They would have you believe that it's the application (cron) that has the problem, and not launchd.

What a load of crap. Our job as software designers is not to program our hearts desires, and convince people that our new toy is more beautiful. Our job is to provide the functionality that the customers require and even desire. Launchd lacks an obvious and important feature - explicit dependencies. Cron doesn't work proplery under launchd. And ultimately, cron was there first. Don't believe them when they try to tell you this is a good thing.

But that's just being pragmatic. Ultimately, the "purist" view that launchd proponents are offering is flawed even in the theoretical. According to launchd proponents, cron should "negotiate" it's own dependencies through IPC. As far as I can tell, this means that cron should start up, and then it should sit around watching for directory services to become active. But if the benefit of launchd is that services only run when they need to run, and launchd handles the rest, then why would they want cron to sit there running and watching for an event where NIS comes on line? This makes no sense.

It also makes no sense in terms of clean layering and application design. cron should not have to care about where it's users are coming from. It should use the getpwnam() call, and trust that the system has already done the work behind getpwnam(). Cron has NO BUSINESS watching for directory services. It's absolutely ridiculous to expect cron or any other computer service to actively participate in the system startup process, because this means that services suddenly have to know a whole lot of extra stuff that's really not their business.

Imainge if Apple said that every application should go out and negotiate it's own printing paradigm - find printers, discover the models and features, and communications protocols. No one would actually fall for this, because it's the operating system's job to provide printing to the applications. But this is exactly what they're doing for the resource of orderly booting.

Launchd has some good stuff. Delayed startup of network applications is a really great idea. But that's no excuse for throwing away basic features needed for booting. Launchd needs explicit dependencies.


Reader Comments (Experimental. Moderated, expect delays. Posts may be edited or ignored. I reserve the right to remove any or all comments, at any time.)

8 comments:

At 2009/01/26 14:09
Minor Daemon wrote:

Who the heck still uses cron? ;)

Honestly though, cron is supplied with the system, but it is basically an add-on. It seems to me it's up to the administrator to configure add-on software, for backward and forward compatibility, until the developer provides an update resolving any issues that might have cropped up -- if they so choose.

Creating a Launch Agent and script can provide login-time environment validation and remediation when required, no?

At 2009/01/26 14:31
wrote:

Apple claims that they support OS X as a standard Unix desktop replacement. In that context, cron can not be considered "add-on".

Yes, there are a few different kludgy ways I could get around this. But it shouldn't be my job to fix Apple's bugs.

tom

At 2009/08/29 18:53
Ryan Bowlby wrote:

Launchd in 10.5 replaced the "OnDemand" value within .plist job files with "KeepAlive". The new "KeepAlive" allows two options that help address explicit dependencies.

1. PathState - "Each key in this dictionary is a file-system path. If the value of the key is true, then the job will be kept alive as long as the path exists. If false, the job will be kept alive in the inverse condition. The intent of this feature is that two or more jobs may create semaphores in the file-system namespace."

2. NetworkState - " If true, the job will be kept alive as long as the network is up, where up is defined as at least one non-loopback interface being up and having IPv4 or IPv6 addresses assigned to them. If false, the job will be kept alive in the inverse condition."

So now you can configure crond to require the presence of /some/directory/ before being launched.

I guess the apple developers ventured from their ivory tower for a bit. I'm as surprised as you are. ;)

My site: www.ryanbowlby.com

At 2009/08/31 13:30
wrote:

I'm not sure if I was aware of those when I wrote this article. They don't seem sufficient (to me) to adequately address the issues, and specifically, I don't see how they can be used to delay cron startup until the needed directory service is running.

tom

At 2009/09/05 4:56
Ryan Bowlby wrote:

I agree these two directives are less than sufficient. They probably only continue supporting crond so as to remain POSIX compliant. I continue to use cron, creating a plist file for something as rudimentary as a scheduled task is a less than efficient use of time.

Is there any file/directory on the system that exists only after directory services has started? Such as a PID file or additional mounts.

At 2009/09/08 21:33
wrote:

I don't know of any files I could use. And things like pid files have the annoying habit of continuing to exist after the process or system has crashed, so in general, it wouldn't be a reliable method.

tom

At 2011/03/14 0:26
wrote:

good comments.

cron is lame.

the author of it is not to be trusted to code in the users' interests,

anymore than the launchd authors. it's all about the $$$ they get from corporations who haven't the foggiest idea what makes for reliable software.

i wrote a silly shell fucntion that takes a date or time period in any format, calculates the time to wait, sleeps then executes. simple.

no daemon wakeup nonsense. can be combined with nohup.

no resource competition issues like with at(1).

then later i discovered paul jarc (code.dogmap.org) wrote something like this in c called runwhen.

no one is perfect but djb and his followers have a much more solid approach to this stuff.

to take another example, the init on my os has grown from 18 lines to 80+ lines of code since the bsdi days.

these are very simple parts of the system that are have unnecesarily tweaked to the point of becoming a nusisance.

but adding features helps sell software and support, so...

At 2011/03/14 9:03
wrote:

I certainly appreciate the idea of keeping things simple. But as Einstein said, "Make things as simple as possible, but not simpler." A program which sleeps for a while and then runs something is not cron. It is not even the "at" command. The scheduling won't survive a reboot. Unless of course you think that "reliable" is just an unnecessary tweak.

And cron was written by different people over time, with the original version attributed to Brian Kernighan. I've never heard anyone say he's not to be trusted.

tom

End Comments

Add a comment


More Mac OS X Stuff


Tom Fine's Home Send Me Email