Thursday, July 31, 2014

An adventure in writing a PECL extension for PHP

Okay this isn't so much a guide on how to write a PECL extension as it is to discuss my recent experience in writing a PHP extension, publishing it, and documenting it. My hope is that by reading through the struggles I went through, that others can benefit.

Writing an extension for PHP requires serious skills and patience. Extension writing is a complex macro dance and really requires good planning to pull off successfully. In my case, I wanted to introduce native named synchronization objects into PHP. This is something I've felt has been missing from the language for far too long and I didn't see anyone else doing work in the area. The first thing I did was write a cross-platform library in C++:

Cross-platform C++ library

Writing a C/C++ library as a proof-of-concept is a good start to writing a PHP extension. I highly recommend it. Developing a separate library allowed me to work out most of the kinks in the logic apart from the Zend engine. After I solidified the working model, I began work on the extension itself. I had to port the library from C++ to C, but that was a fairly trivial operation.

The other major thing I did in advance of writing the extension was occasionally dig deep into the PHP source tree to figure out how functions actually worked behind the scenes. Some functions are so hopelessly complex (e.g. file handling because of URL wrapper support) that they are simply too dense to understand. Other functions, however, make for great snippets to commit to memory. The prerequisite time with the PHP source code before writing an extension is, IMO, about 3 to 4 months of casual interaction. That is just long enough to feel comfortable navigating the PHP source tree. If there was one thing I learned here is that every function in PHP is part of an extension, which may come as a surprise to many people.

The frustrating thing about PHP extension writing is that there is almost no documentation on how to go about doing it. There's a book (dead tree edition) by Sara Golemon on the topic and little else beyond a few minor, slightly dated blog posts. I ended up doing what most extension writers do - scouring the source code of other extensions to cannibalize specific ideas to write my own. "Simple" things in normal C such as returning a value become a Zend macro in PHP. Knowing which macro to use and what all the crazy options do is the hard part. Since I was making a set of object-oriented classes in PHP instead of functions, the amount of documentation on the topic approaches zero very quickly. So scouring other code based on the public documentation on php.net helps to figure out which macro is probably the right one. Having a good understanding of the source tree structure helped go a long way to figuring things out on my own. Ultimately, extension writing for PHP is a fairly dark art, which may explain why there aren't a ton of extensions out there. I got the distinct impression that the developers like it that way to require a minimum level of software development competence before work on an extension may begin.

That said, the PHP documentation on extension writing does a decent job of getting developers started. The 'ext_skel' script makes a mostly working skeleton for a new extension. Not bad. I think the main issue I ran into regularly was that ./buildconf has to be run with the --force option until the configuration file is finalized when using the release builds of the PHP source tree.

I highly recommend developing an extension on Linux first and then porting it to Windows after that. The compiling environment on Linux for PHP is far superior to the Windows build environment. However, if you are like me and prefer Windows text editors and IDEs (i.e. don't like Linux editing tools), do what I did which was to use WinSCP to act as the go-between and then I used my favorite Windows-based text editor to edit the source code. Doing that worked out pretty well for me. And since I had worked out all of the core issues with my extension in a separate library (which allowed me to fire up Visual Studio for real debugging), nearly all of the extension writing process was porting the code (easy) and writing plumbing to connect it to Zend (more difficult).

I also developed a small test suite to validate that the code was working as expected. I highly recommend making a small test suite as it helps catch bugs in the code.

Once the Linux version was done, I went and tested it on Windows. The Windows build system, as mentioned earlier, is more fragile. One wrong character in the wrong place and the whole Configure.js script will blow up without specifying why. Setting up the Windows build environment is also a bit more difficult as there are several distinct pieces that have to be in the right place. However, once everything was in place, it built just fine. Again, a small test suite can come in very handy for tracking down bugs.

At this point, the extension was developed but my adventure was only beginning. See, I wanted it to be a PECL extension. If I were simply satisfied with just having a PHP extension that anyone could compile into PHP, that would be the end of it. However, PECL sprinkles on some special magic that transforms an extension into something that people want to use because an extension is suddenly easy to install via "pecl install extensionname" and then PECL handles downloading, extracting, compiling, and installing the extension. Also, package maintainers for major OSes like Ubuntu will pick up the extension and make it easy to install with package management tools like 'apt-get'. There are huge visibility advantages to deploying an extension via PECL.

Releasing on PECL requires approval. The PHP devs are rather stringent about who they let in, so I knew I needed to make the new extension relatively awesome. I got on IRC and the PECL dev mailing list and started several discussions. This is a very important step as there are conformity issues that will crop up. It is important to be super-flexible and willing to make changes to the code. I ended up moving a GitHub repo around and revamping a lot of code during this process. Once everyone seemed to be cool with the work that had been done, it was time to apply for access to PECL, PHP, and documentation repos via the PECL signup form. It's a completely laid-back process - therefore, after applying, I recommend just finding another project to work on. It can take up to two months for the devs to get around to accepting new users and getting those users set up with the appropriate level of access.

The hardest part about developing a PECL package is figuring out what the "correct" way to develop an extension is. If writing a regular extension is somewhat of an obscure task, the process of releasing a PECL package is more so. This isn't really anyone's fault since there isn't a whole lot of need for new general-purpose extensions for the language to begin with. So hammering out a good guide is a bit of a low-priority when there are other, more pressing matters to attend to. Plus, I'd wager that it raises the bar to entry somewhat significantly. PHP is software used on millions of hosts, so it needs to have some semblance of quality control applied to it. Obscuring the process of writing and releasing an extension is a pretty good solution to that problem.

At any rate, once the PECL account is approved, a whirlwind of activity happens. In general, I had already generated a PECL package but I had to regenerate it after rewriting parts of my 'package.xml' file. I had also looked at what doing documentation would require. But by the time the approval actually happened, I had kind of forgotten what I had done, so I made sure the test suite still passed and the extension still built as a sanity check. In all, it only took about one weekend to do the actual release and documentation cycle. Again, looking at how other extensions do things helps a lot with creating a consistent experience.

Here's the PECL package:

http://pecl.php.net/package/sync

Here's the documentation:

http://us1.php.net/manual/en/book.sync.php

The original version 1.0.0 of the extension had a bug that only showed up on some hosts that the PHP dev team caught. I tracked it down and fixed it and sheepishly released the 1.0.1 version of the extension. Interestingly, the 1.0.1 version received an "automatic" Windows DLL. I didn't build it but I suspect the team was waiting for a fix for the bug before letting the system do the Windows build. Also, within five minutes of 1.0.1 being released there were 40 downloads according to the PECL stats page. I assume there are automated processes sitting on the announcements list looking for new package uploads to PECL - either that or crazy people.

The documentation writing bit was a different experience too. Good documentation includes code examples that cover real-world scenarios (i.e. not contrived). The PHP documentation is written in a giant set of XML files. The primary way to introduce documentation is via Subversion. The main way to adjust the documentation is via a custom web-based GUI that the team has come up with. At the time of this writing, there is a site that publishes the latest documentation every six hours and the main PHP website and mirrors are updated every Friday. The GUI is pretty neat in that it attempts to manage the translation teams and tracks which bits of documentation are no longer building (because XML is pretty fragile). Because I was introducing new documentation into the tree, it was far more efficient for me to use the Subversion route. I made sure that my changes built locally without issues before committing them back into the main repository. Because so many people are involved in PHP development, it is very important to tread carefully when committing anything (code or documentation) and try to avoid breakages.

In my opinion, the source code is simple enough to use as a tutorial extension that does something useful without being overly complicated for those interested in writing object-oriented extensions for PHP. I would study the C++ library first to understand that code. Then the similarities between it and the PECL package stick out and it becomes easier to understand the more obscure Zend bits.

Hopefully these tips help someone out with their extension writing efforts.

No comments:

Post a Comment