While developing my latest PHP-based project, WebCron, I ran into an issue that has bothered me for a while - atomicity. An atomic operation is one where only one thread of one process is allowed to execute some piece of code. Actually, even under C/C++, I've been bothered by this issue.
Unlike Linux, Windows really has the most friendly approach to creating an environment where atomic operations may thrive. Named mutexes is one area where Windows really, truly shines above all the OSes out there. Try porting CreateMutex() to another OS and you'll inevitably have some real head-scratching sessions when you try to do a cross-process, named mutex. So-called 'mutexes' under *NIX OSes are usually 'pthread'-based, which are really more in line with Windows "critical sections" than "mutexes". A programmer coming from the Windows world is going to be utterly confused because they've been pampered by Microsoft and no one in the Linux community sees any benefit of adding REAL mutexes to the kernel.
The problem lies in the fact that the Windows OS itself manages the mutexes. When you want to lock a mutex, a jump into kernel mode is made and the kernel manages the locking process. If a process or thread goes away while holding onto a locked (signaled) mutex, the Windows kernel potentially eventually cleans up the mess left behind. And here is the key point: There is no equal process on any other OS and definitely nothing that is cross-platform (natively). Or at least no single approach that has been "standardized".
Let's look at the options available for creating something close to a named mutex. I'll be specifically focusing on what is available for PHP:
flock() - This function is the first recommendation by the PHP developers as their "cross-platform" "solution". If you read the documentation and the comments carefully, it quickly becomes apparent that it does NOT behave the same under all OSes, especially Windows. Here are the core problems: The non-blocking option is ignored (i.e. doesn't work) on Windows, the return values are not consistent across OSes, it has issues with multi-threaded web servers, and it requires a file to already exist (in advance) on the system to be used as the "lock file".
System V Semaphores - These functions aren't available for Windows, requires an integer instead of a name to identify the semaphore (seems hacky to me), and the functions are not compiled into PHP by default.
For WebCron, what I needed was a mutex object that provided a cross-platform, cross-process, cross-thread, non-blocking, expire-capable, multi-lock capable, named mutex. When developing a product that will be used on who knows what OS, it has to at least support "The Big 3": Linux, Windows, Mac. Clearly the above functions weren't going to cut it for my needs. I needed something else.
I scoured the PHP documentation for a while. I knew the key was that the OS must manage the creation of the mutex, not the application. I knew that if the application managed the mutex, there will be race condition issues and other nastiness involved.
Then I found the solution I was looking for: fopen().
Every OS manages the file system and all access to files. However, the 'w' and 'r', 'a', and 'c' options of PHP automatically create a file if it doesn't exist. And then I found...'x'. It creates a file if it doesn't exist and will return false if it does. The '@' prefix suppresses errors and warnings. Since the OS manages the creation of the file and since file system access is universal, I had found my cross-platform solution (i.e. silver bullet). And WebMutex was born.
The first hurdle was deciding how to handle the lock files. I ended up opting for using an absolute path and filename and then appending '.lock'. I call @fopen($filename . ".lock", "xb") and check the result. If it is false (i.e. some other process beat this process to it), then I don't have the lock, sleep for a random amount of time, and try again. Releasing the lock was a matter of deleting the file. If Lock() is called multiple times, I just increment an internal counter instead of touching the file system again.
My next problem was that PHP scripts can only run for 'x' seconds before PHP forces the script to exit. PHP doesn't run cleanup code when it forcefully stops execution. So I had to come up with a way to delete the lock file. I eventually settled on creating another file using the fopen() 'x' option with '.stale' appended to the base filename. That way there aren't multiple processes deleting the lock file - potentially introducing a race condition that allows multiple locks to be obtained. Of course, it becomes possible to end up with both '.stale' and '.lock' files. However, that should be rather rare and requires manual intervention anyway.
The solution I came up with to the second problem isn't without its own problems though. If the file system is shared and there are two processes running on multiple machines and the timestamps of those machines aren't perfectly in sync, the lock file might get deleted prematurely.
The downside to this whole approach is performance. A file-system based solution isn't going to be nearly as fast as a native kernel solution. However, named mutexes aren't speed-demons to begin with. I'm after something that works not something that performs spectacularly.
However, the end result is actually a lot simpler than I originally thought it would be. It meets all my needs and addresses my concerns. And, perhaps more interestingly, this solution is easily portable to other languages too. The only requirement is being able to open a file for creation and return a failure condition if it already exists.
The source code to WebMutex is under the same license as WebCron (MIT or LGPL, your choice). You'll have to download WebCron to get at the source code for WebMutex.