Monday, October 11, 2010

Forget flock() and System V Semaphores - use WebMutex instead

While developing my latest PHP-based project, WebCron, I ran into an issue that has bothered me for a while - atomicity. An atomic operation is one where only one thread of one process is allowed to execute some piece of code. Actually, even under C/C++, I've been bothered by this issue.

Unlike Linux, Windows really has the most friendly approach to creating an environment where atomic operations may thrive. Named mutexes is one area where Windows really, truly shines above all the OSes out there. Try porting CreateMutex() to another OS and you'll inevitably have some real head-scratching sessions when you try to do a cross-process, named mutex. So-called 'mutexes' under *NIX OSes are usually 'pthread'-based, which are really more in line with Windows "critical sections" than "mutexes". A programmer coming from the Windows world is going to be utterly confused because they've been pampered by Microsoft and no one in the Linux community sees any benefit of adding REAL mutexes to the kernel.

The problem lies in the fact that the Windows OS itself manages the mutexes. When you want to lock a mutex, a jump into kernel mode is made and the kernel manages the locking process. If a process or thread goes away while holding onto a locked (signaled) mutex, the Windows kernel potentially eventually cleans up the mess left behind. And here is the key point: There is no equal process on any other OS and definitely nothing that is cross-platform (natively). Or at least no single approach that has been "standardized".

Let's look at the options available for creating something close to a named mutex. I'll be specifically focusing on what is available for PHP:

flock() - This function is the first recommendation by the PHP developers as their "cross-platform" "solution". If you read the documentation and the comments carefully, it quickly becomes apparent that it does NOT behave the same under all OSes, especially Windows. Here are the core problems: The non-blocking option is ignored (i.e. doesn't work) on Windows, the return values are not consistent across OSes, it has issues with multi-threaded web servers, and it requires a file to already exist (in advance) on the system to be used as the "lock file".

System V Semaphores - These functions aren't available for Windows, requires an integer instead of a name to identify the semaphore (seems hacky to me), and the functions are not compiled into PHP by default.

For WebCron, what I needed was a mutex object that provided a cross-platform, cross-process, cross-thread, non-blocking, expire-capable, multi-lock capable, named mutex. When developing a product that will be used on who knows what OS, it has to at least support "The Big 3": Linux, Windows, Mac. Clearly the above functions weren't going to cut it for my needs. I needed something else.

I scoured the PHP documentation for a while. I knew the key was that the OS must manage the creation of the mutex, not the application. I knew that if the application managed the mutex, there will be race condition issues and other nastiness involved.

Then I found the solution I was looking for: fopen().

Every OS manages the file system and all access to files. However, the 'w' and 'r', 'a', and 'c' options of PHP automatically create a file if it doesn't exist. And then I found...'x'. It creates a file if it doesn't exist and will return false if it does. The '@' prefix suppresses errors and warnings. Since the OS manages the creation of the file and since file system access is universal, I had found my cross-platform solution (i.e. silver bullet). And WebMutex was born.

The first hurdle was deciding how to handle the lock files. I ended up opting for using an absolute path and filename and then appending '.lock'. I call @fopen($filename . ".lock", "xb") and check the result. If it is false (i.e. some other process beat this process to it), then I don't have the lock, sleep for a random amount of time, and try again. Releasing the lock was a matter of deleting the file. If Lock() is called multiple times, I just increment an internal counter instead of touching the file system again.

My next problem was that PHP scripts can only run for 'x' seconds before PHP forces the script to exit. PHP doesn't run cleanup code when it forcefully stops execution. So I had to come up with a way to delete the lock file. I eventually settled on creating another file using the fopen() 'x' option with '.stale' appended to the base filename. That way there aren't multiple processes deleting the lock file - potentially introducing a race condition that allows multiple locks to be obtained. Of course, it becomes possible to end up with both '.stale' and '.lock' files. However, that should be rather rare and requires manual intervention anyway.

The solution I came up with to the second problem isn't without its own problems though. If the file system is shared and there are two processes running on multiple machines and the timestamps of those machines aren't perfectly in sync, the lock file might get deleted prematurely.

The downside to this whole approach is performance. A file-system based solution isn't going to be nearly as fast as a native kernel solution. However, named mutexes aren't speed-demons to begin with. I'm after something that works not something that performs spectacularly.

However, the end result is actually a lot simpler than I originally thought it would be. It meets all my needs and addresses my concerns. And, perhaps more interestingly, this solution is easily portable to other languages too. The only requirement is being able to open a file for creation and return a failure condition if it already exists.

The source code to WebMutex is under the same license as WebCron (MIT or LGPL, your choice). You'll have to download WebCron to get at the source code for WebMutex.

8 comments:

  1. Looks interesting, but it's a shame that PHP doesn't offer these kind of things in the standard libraries.

    ReplyDelete
  2. @Florin - Well, I make interesting software. :) It is a shame there is no Mutex class but we have to work with what is available. I gave my class the name "WebMutex" in case the developers create a "Mutex" class. I didn't want to have a future naming conflict.

    ReplyDelete
  3. quote: "Of course, it becomes possible to end up with both '.stale' and '.lock' files. However, that should be rather rare and requires manual intervention anyway."

    That is unacceptable and a major failure in your code. It is also something that is fairly easy to correct. The easiest way to do it is to just delete the .share file if it is more than some amount of time old, probably 2 minutes. This should suffice, although it is way too long for a real-time usage where code is executing in less than a second, or even several seconds. (unless it doesn't matter that you can't get a lock, because you "don't mind waiting", or "you have an alternate route of logic and operation to take")

    I don't know if assuming the file system "exact in its file I/O operations" is "clean" enough to use as an absolute, either. I think there may be situations where "confusion" may cause one instance to think it got the lock, when in reality some other instance thinks it got it, as well. (I think the more instances that are in contention, the higher the probability of failure)

    I would suggest that you don't use the mere presence of a file as the lock, but create something like a GUID as an OWNER ID (possibly with an N-digit random number appended) , and store that in the files, and use that as part of the logic. It requires more file I/O, because I would close and then re-open the file to read the GUID as a "double-check", but is probably closer to an absolute solution without any chance of "confusion".

    Beyond that, I don't know if you have implemented a true mutex algorithm that will work. I think it is too simplistic, and would probably fail at some point, even with my suggestions.

    ReplyDelete
  4. @Popeye - I'll address your concerns.

    1) If you end up with both .stale and .lock files, your server is hosed and probably in dire need of a reboot. It is most likely under either a DDoS attack or Reddit/Digg/Slashdot/etc. At the very least, CPU and RAM usage would be pegged at 100%. The .stale file is used to control deletion so that only one instance of the script deletes the .lock file. Imagine three scripts running nearly simultaneously - script one and two see that .lock is old. Script one deletes the .lock file, then script three sees that it can obtain the lock and creates a .lock file and continues. Script two then deletes the .lock file. Then script one sees that it can obtain the lock and creates a .lock file and continues on. Oops! Two processes have acquired the same lock. .stale is only used if the lock is configured to expire, which I typically set to maximum PHP execution time + 5 seconds. I'm using this as part of WebCron, which sets PHP's maximum execution time to 60 seconds with a preferred 30 second timeout. .stale is technically optional - the .lock file could stick around forever until manually deleted. It was written this way because it is technically the most correct approach regardless of how you or I feel about it.

    2) False. The OS manages the file system atomically. You are guaranteed, when requesting exclusive creation access to a file, that the file is created exclusively. The OS would have to be severely broken to not behave properly.

    3) False. And no need. The OS manages the file - writing data to it does nothing useful. That's the beauty of this solution - the OS that hosts the file system manages the file.

    4) It IS simplistic, but anything more complex would be broken. This is also the ONLY portable/cross-platform solution.

    I would much prefer a real named mutex solution implemented into the kernel of every major OS, BUT that requires every OS vendor to get their act together and then the PHP developers would need to implement the relevant APIs. Perhaps in another decade we'll see something like this happen.

    ReplyDelete
  5. Sorry to burst your bubble, but why do you believe this solution is portable between operating systems? It would be really strange if you nobody had thought of this if the solution is so simple, right?

    The simple answer is of course that creating a file, or sometimes a directory is the way to implement things like flock. That is provided the file system support atomic creation of files in the first place. Atomic creation of directories are more often supported than atomic creation of files, but in some cases this option is not open at all. The absence of atomic operations is indeed the reason why it is so hard to build a "simple" CreateMutex that is portable.

    ReplyDelete
  6. @Per-Anders Staav - All major OSes implement file and directory creation atomically. As a result, this is a portable solution.

    As to why not many people have discovered this until now is because not many people need such a solution and flock() in blocking mode generally suffices for most folks. Only when someone tries to develop a cross-platform, non-blocking mutex in PHP does flock() fall apart. My solution works. And, even though it is simple, it works well.

    ReplyDelete
  7. The solution I'm searching for is quite different, cause of the distributed needs. I also need a semaphore and not just a mutex. Semaphores manage a kind of waiterslist. The best solution of all is off course the monitors, but be realistic...
    My needs are for my current distributed PHP/PostgreSQL API project. There will be many PostgreSQL 9 servers in a synchronous multi-master solution. And with each PostgreSQL server will come an Apache with the PHP REST API.
    Some processes will absolutely need to access critical sections of the code, where only one process amongst all servers will go in at the same time...

    Also, there will be so much requests per second that using files will certainly be a bit too slow, don't you think so ?

    All those servers will run Linux distributions, so at this time, DIPC/KDIPC that offer distributed IPC System V semaphores seems to be the best solution, but again, there'll be a problem with our Linux vserver solutions, cause we'll need to compile a patched kernel on all host machines, and I'm not sure it'll work fine with vservers. I also thinked about implement a semaphore system using PostgreSQL tables (one for the waiters list, and another for the semaphore counter). The P and V primitives will use the tables or tables rows lock...

    Still thinking, searching and open to all ideas ;-)

    ReplyDelete
  8. @crochat - A Mutex is just a special case of a Semaphore. It sounds like your needs are beyond what my little solution can provide. In your case, you have the added bonus of having full control of the box. Consider writing an asynchronous named mutex/semaphore TCP/IP server.

    Look specifically at writing a nginx module. nginx is a fantastic asynchronous TCP/IP server and you can write your own modules for it. An nginx-based solution would completely bypass PHP, be reliable under high-performance scenarios, work anywhere that nginx compiles, and, last but not least, only need to sit on one of your servers. PHP would connect to the one server managing your global semaphore(s) prior to using the relevant database. It would be fast and efficient since all the mutexes/semaphores would reside in RAM.

    ReplyDelete