Skip to main content

An alternative, better approach for writing technical documentation

Technical documentation for a software project is a complex topic. I've seen all sorts of approaches throughout the years and implemented many of them myself. I've seen everything from a "no documentation found here" approach to using a Wiki to hundreds upon hundreds of pages in a PDF (or other specialized format like CHM) that no one really reads. In addition, technical documentation almost never mirrors reality. The source code is always the definitive authority on what happens. Technical documentation is mostly an afterthought, "Oh, yeah, I guess I better document that thing I wrote."

Most users of software can't be bothered to read the source code OR the source code is proprietary and only available in binary form or remote means (e.g. a REST API). Users want something more human-readable than source code to look at anyway even if the source code is definitive and, in many cases, more readable than the technical documentation.

First I want to talk about the decision to have no technical documentation whatsoever. Expecting users to read the source code to figure out how all of the various options work severely limits the available audience that will use a product. For some developers, this is a plus: Only really dedicated and talented individuals who are willing to invest the time and effort will show up and they will get to really know the ins-and-outs of the product. However, most people will view the lack of technical documentation as a sign of an immature product and/or an immature software developer. It certainly is easier to not have to maintain one more thing but people appreciate some effort to document a project with human-readable text on its operation.

Next up on the chopping block is the Wiki. Using a "the community will write my documentation for me" Wiki approach does not work at all. Every time I see a Wiki where there should be formal technical documentation, I also see a slew of garbage edits and commentary that result in nearly incomprehensible mumbo-jumbo that is quickly outdated within 6 to 12 months. For example, someone will inevitably use the Wiki to inappropriately report a bug in the software that later gets fixed when someone actually posts it to the bug tracker. Another example is that someone will write some incomplete comment/thought between two coherent paragraphs of documentation, creating a rather jarring effect for readers just looking for information. Other than Wikipedia, which I rarely visit any more thanks to its toxic community, nearly all Wiki software is painfully slow (bad programming/database design), editing is awkward (especially for writing formatted code examples), and results are sub-par compared to the use of automated tools. Letting Joe Schmoe freely edit documentation and expecting good results is a terrible idea because he's most likely a spambot (uh, apologies if your name is actually Joe Schmoe). The quality of what people put into a Wiki is all over the map but is mostly terrible. This goes to show that there has to be some quality assurance process involved for good documentation to happen.

Using an automated tool is another approach that is widely used. I've never been fond of tools like Doxygen that require altering code to document each and every function and, in severe cases, end up polluting the code with more comments than actual code. Documentation tools also don't really help other developers know which functions are actually important and working usage examples end up being fairly sparse due to the reliance on the tools themselves. Now I'm sure some people think, "None of that is a problem." However, the number of lines of code + comments really start to add up and doesn't make much sense for interpreted languages. I do a lot of web development in PHP and Javascript. Comments bloat both of those languages rather significantly and so various people have come up with "minifiers," which are glorified code mangling, comment removal, and whitespace removal tools which turn the code into an unreadable, undebuggable mess. In addition, automated documentation tools require comments to be written in a very strict format so that the tool can detect the documentation and my experience, especially with Doxygen docs, is that most developers only fill out half of the information that a user will actually need because developers tend to be lazy around the point where they start writing documentation (see intro to this post). The last thing to note here is that just because documentation is there in the actual source code still doesn't mean that users will actually read the source code. I see a lot of weird fears and trepidation about reading other people's source code like it's sacred, holy, and untouchable territory. Just read the source code already! It's not hard (usually), plus it gives you some insight into what makes the developer tick and whether or not you want to interact with the developer in the future when you run into actual issues with their software. After all, a developer's personality matters and their source code speaks volumes about who they are and whether or not they care. I digress.

Do I even need to talk about the awfulness of PDFs, CHMs, manpages, HTML generators, and XML formats like DocBook? I've seen 500+ page PDFs, proprietary software to generate awkward binary formats like CHM, the attempt to shoehorn XML into an arena that it's not really well-suited for, generators to output thousands of HTML files (Doxygen's guilty here), and so on. Most of these sort of options break easily (i.e. are rather fragile) and outdated copies tend to float around for a long time. Nothing really beats a set of handcrafted technical documentation that carefully extracts the best and most commonly used parts of an API/library/application in a format that is pleasing to interact with for the user.

StackOverflow, not too long ago, attempted to "solve the problem" of writing and maintaining technical documentation by adding Documentation Beta. The response has been a real mixed bag, as to be expected. I generally take a "let's wait and see" late-adopter approach to newfangled things like that before diving in. StackOverflow more recently announced a significant overhaul of the Documentation system due to a variety of unanticipated issues, which reset any sort of mental countdown timer I might have had in my head. I don't care much for StackOverflow and StackExchange in the first place for a variety of reasons, but mostly because their moderator community is toxic, so why would I put my documentation on their website?

For most of my life as a software developer, I tightly controlled all technical documentation and didn't let anyone but myself touch it. After all, the software I designed was my naked newborn baby and daddy's got control issues. (I only apologize for the fact that I couldn't come up with a more awkward and poorly written analogy.) Anywho, for the most part, I've kept my documentation in sync with the software I write and have done a pretty decent job with its structure so there's not much to complain about. After many years of wrestling with what to do about technical documentation whenever I came to crossing that bridge, I recently stumbled onto the best solution to date: GitHub. More specifically, GitHub Markdown.

Within my GitHub repositories, I now create a 'docs' subdirectory and store all of the relevant project documentation there in the GitHub Markdown format using .md files. Users can view the documentation on GitHub in a nice, neat, organized format. GitHub automatically formats and syntax highlights source code examples, which I've found actually encourages me to write more examples. I can also commit changes to my documentation more easily as I make changes to the software I write without polluting the source code itself with unnecessary comments nor having to wait for a software release to happen before updating the documentation. In addition, the documentation is included when users download the whole repository, which is great for anyone who prefers offline use without necessarily being stuck with outdated or weirdly formatted documentation floating around on their system (just 'git pull' the latest and open it in any text editor). It is also in version control, when means it benefits from things like tagging and branching as well as the usual version control features. Users can also start pull requests or open an issue on the issue tracker to request changes to the documentation and even just fork it if they think they can do a better job. This approach achieves an excellent balance for technical documentation within a wide variety of projects, including open and closed source software as well as APIs, while maintaining a solid level of quality assurance by moderating the changes through trusted means. This solution also obsoletes the GitHub Wiki, which I've only ever seen used to some success when only project owners have access, but unfortunately excludes the larger community from making any changes. Personally, I try to remember to disable the Wiki option whenever I create new repositories on GitHub. The GitHub Markdown project documentation approach is a win for everyone!

Well, not quite everyone. The only downside I see to relying on the GitHub Markdown approach is that GitHub is now accidentally being used to host a website that will result in more traffic from people just wanting to read documentation online. The nice folks over there at GitHub may or may not take too kindly to their system resources being used this way. Of course, if it becomes a significant problem, I'm sure a solution will present itself as such things frequently do.

Comments