Cubic

Posts

An alternative, better approach for writing technical documentation

Technical documentation for a software project is a complex topic. I've seen all sorts of approaches throughout the years and implemented many of them myself. I've seen everything from a "no documentation found here" approach to using a Wiki to hundreds upon hundreds of pages in a PDF (or other specialized format like CHM) that no one really reads. In addition, technical documentation almost never mirrors reality. The source code is always the definitive authority on what happens. Technical documentation is mostly an afterthought, "Oh, yeah, I guess I better document that thing I wrote." Most users of software can't be bothered to read the source code OR the source code is proprietary and only available in binary form or remote means (e.g. a REST API). Users want something more human-readable than source code to look at anyway even if the source code is definitive and, in many cases, more readable than the technical documentation. First I want ...

The Star Trek stardate for a better UNIX timestamp

A large body of software will break in unusual ways on January 19, 2038. This is known as the UNIX 32-bit timestamp bug. However, UNIX-style timestamps are used in all OSes, so it is a global phenomenon and a poorly designed software defect. 64-bit timestamps merely extend the problems presented by 32-bit timestamps to identify what day it is (i.e. what the end-user actually cares about). Perhaps there is a better timestamp we should have been using all along: Star Trek stardates. Formal representations: 32-bit: 1 sign bit + 14 bit "day" + 17 bit percentage 64-bit: 1 sign bit + 31 bit "day" + 32 bit percentage If we were using Star Trek-style stardates for date/time storage, our timestamps would have better precision and the upcoming 32-bit software problems would have happened early 2014 instead of waiting until 2038 for the breakages to happen. That is, force the people who created the problem to clean up their mess instead of letting them retire an...

Secure web server permissions that just work

I have been doing web development since, well, web development basically began. And I've used a wide range of hosts. Since I don't see anyone stating answers succinctly and definitively anywhere, it is time to write a solution to the question on everyone's mind: What are the permissions that I should set for web server directories and files? The first step is to identify the user that the web server will access files with/run under. For example, many Linux distributions set up 'www-data' as the user. I'll be focusing mostly on Linux as it powers about 66% of the Interwebs, but Windows Server users can benefit too. It is important to get your setup correct from the very beginning. Propagating permissions down the website tree as new directories and files are created is critical to maintaining sanity. Knowing who created a specific file or directory is also important when working in a team. As always, if you can't trust other users who might have a...

DNS cache spoofing/poisoning is useful for web developers

When most people hear the word "poison" they immediately conjure up bad things in their mind from some weird crime drama that they watch on TV. DNS cache poisoning (or spoofing) is generally considered a bad thing because it means that a domain name is resolved to the "wrong" IP address. It is usually used in terms of an attacker that gains access to a DNS host to deliver the wrong responses to DNS requests or intercepts and alters responses to requests, which then points the client at the wrong IP address. DNS cache poisoning, however, can be used for a few positive, legitimate things. Let's say you want to relaunch a website on a different web host. To do this, you could develop it locally and then upload the files when you are finished to the new host and switch DNS over and watch it break spectacularly. But if you want to get a relaunch 95% right, you need to see the new website before DNS is switched over. To do this, DNS cache poisoning comes to th...

WTFPL is harmful to software developers

Occasionally, I run into a piece of software that utilizes an inappropriate license with a crude title. Today I want to talk about one of those licenses. It is called the WTFPL and it is harmful to any software developer that uses it. I don't use foul language even among impolite company, so I'm not going to copy the license text here. You can read it if you want but it isn't necessary. There are about 300,000 words in the English language at any given time. Of those, about 200 words are considered to be rude, crude, foul, and generally inappropriate to use in most settings. The words a person chooses to use in casual conversation says a lot about them. Language issues aside, the basic gist of the WTFPL license says that you can do whatever you want with the software that the license is associated with. If you look at a traditional software license (aka EULA) with its many pages of text and the various "license wars" out there, the idea behind the WTFP...

Virtual Private Servers (VPS) and Cloud hosting are now viable

For many, many years, I was a massive fan of dedicated web hosting. I was VERY vocal about how you couldn't run a legitimate, professional business without using dedicated web hosting. And time and time again, I was proven right as people on shared web hosting came out of the woodwork in various places who had bet their business on shared hosting and lost - and sometimes they lost EVERYTHING including their business and all their customers! Shared web hosting is still the bottom of the barrel, scummy/scammy money grab that it has always been and no respectable business should be caught dead running their web infrastructure on it. Period. That hasn't changed. However, I have been watching a couple of new stars grow from infancy into its own over the past 8 years: Virtual Private Servers, aka VPS, and its newer, shinier cousin Cloud Hosting. Dedicated web hosting is expensive. It has always been because you get a piece of hardware, a network drop, electricity, a tran...

You can still buy a brand new Dot Matrix printer...

Today, I learned that people still buy brand new dot matrix printers. You know, those extremely noisy printers I thought we ditched as soon as it was possible to do so. Well, except for the nutcases who turn them into "musical instruments" and start a YouTube channel: But, no, sales of brand new(!) dot matrix printers are apparently still, relatively-speaking, alive and well: Dot matrix printers on Newegg After doing some research, it turns out that, for bulk printing where output quality and "professional" appearance doesn't matter at all, dot matrix printers can be anywhere from 4 to 8 times cheaper than laser printers per printed page (the next cheapest technology) when amortized over the cost of maintenance of the lifetime of each type of printer. With dot matrix, you're not going to get the speed, accuracy, or the quietness of laser, but you'll supposedly save a boatload of money on toner. Maybe one day we will get a printer that combin...

Bulk web scraping a website and then reselling the content can land you in legal hot water

This interesting article on web scraping just came to my attention: New York Times: Auction Houses Face Off in Website Data Scraping Lawsuit Summary: An auction house in New York is suing an auction house in Dallas for copyright law violations regarding scraping the New York auction house's website listings including their listing photos and then SELLING those listings and photos in an aggregate database for profit. As I'm the author of one of the most powerful and flexible web scraping toolkits (the Ultimate Web Scraping Toolkit ), I have to reiterate the messaging found on the main documentation page: Don't use the toolkit for illegal purposes! If you are going to bulk scrape someone's website, you need to make sure you are free and clear legally for doing so and that you respect reasonable rate limits and the like. Reselling the data acquired with a scraping toolkit seems like an extremely questionable thing to do from a legal perspective. The problem wi...

Setting up your own Root Certificate Authority - the right way!

Setting up your own Root Certificate Authority, aka Root CA, can be a difficult process. Web browsers and e-mail clients won't recognize your CA out-of-the-box, so most people opt to use public CA infrastructure. When security matters, using a public CA is the wrong solution. Privately owned and controlled CAs can be infinitely more secure than their public counterparts. However, most people who set up a private CA don't set up their CA infrastructure correctly. Here is what most private CAs look like: Root CA cert -> Server cert This is wrong because the server certificate has to be regenerated regularly (e.g. annually). If the root certificate is compromised, then it involves fairly significant effort to replace all of the certificates, including the root. What should be built is this: Root CA cert -> Intermediate cert -> Server cert In fact, this is the format that most public CAs use. The root CA cert is generated on a machine that isn't conne...

Scary but fun workaround for a Windows OS bug

I know I'm late for Halloween but we are still between holidays. Back in 2012, I needed a way to start Apache in the background. No one had built anything (that I liked), so I over-engineered a solution: https://github.com/cubiclesoft/createprocess-windows It takes the core CreateProcess() Windows API call, soups it up, and makes nearly every option directly available to command-line applications and batch files. Today I put the finishing touches on a wild new addition. The newest feature adds TCP/IP socket support, thanks to this StackOverflow post . What makes the new feature scary is that ANY IP address can be used, not just localhost addresses. Therefore, you can have it connect out to a server on the Internet or a LAN (or your IoT toaster?) and, in fine unencrypted fashion, route data to/from the stdin, stdout, and stderr of ANY process of your choice. Security? What's that? The feature was added to work around bugs in various scripting languages where they...

E-Ink Readers

Ever since e-ink came out, I do an annual pilgrimage into the world of e-ink and e-readers and come away disappointed. This year is not really any different. Well, okay, it's actually worse because various people out there who run e-reader blogs are now saying e-ink is a dying technology. You know it's bad when it comes to that. That's kind of sad because the industry came SO incredibly and painfully close to minimizing or even eliminating power hungry backlit displays. Had smartphones been delayed just one year (i.e. people had said "meh" to iOS as they should have), we would likely have color e-ink tech today that rivals LCD. That's rather frustrating. But now I need to digress and cover a few core problems with e-ink that has ultimately relegated it to the background. One of the core problems of e-ink is the refresh rate. A screen refresh in e-ink land is generally a giant flash to black to white to various bits of images popping in and out. It...

Shipping a gift? Nifty ideas for online delivery

Everyone loves receiving gifts. It's the thought that counts! Or something like that. However, most families these days tend to spread out across the country and so we end up shipping gifts to each other. But they are delivered in a boring brown box with a generic label on them where the return address is Amazon or another online store. And they might have ordered something else from Amazon or that online store too. And therefore they might open the gift before they are supposed to. It turns out that there is a simple solution that is pretty cool. You can't get away from the boring brown box but you CAN do something about the shipping label on the boring brown box. Address labels typically look like: First & Last Name Address line 1 Optional address line 2 City, State ZIP , Country I've bolded the parts that same-country shipping companies actually pay attention to. Let's hack the address label to make it do something useful for us that won't a...

Elegant iptables rules for your Linux web server

You've just upgraded to your first VPS or dedicated server and you think you've got all the software bits in place (LAMP stack and all that) OR you've moved from a hosting provider with easily configured hardware firewalls. Now you've read something somewhere that says you need 'iptables' rules with your new host. If you have any friends who manage Linux systems, you've also heard that "iptables is hard." In my experience, the only thing hard about iptables is that no one seems to publish decent rulesets and users are left to figure it all out on their own. It doesn't have to be that way! Ubuntu/Debian: apt-get install iptables-persistent RedHat/CentOS: chkconfig iptables on That installs the iptables persistent package/enables storing iptables so that they load during the next boot. Ubuntu/Debian: /etc/iptables/rules.v4 /etc/iptables/rules.v6 RedHat/CentOS: /etc/sysconfig/iptables /etc/sysconfig/ip6tables Those editable...

The most interesting bug in PHP

The most interesting bug in PHP is the showstopper bug in the core of PHP you finally run into after a month of software development just as you are getting ready to ship a brand new product out the door. Specifically, PHP bug #72333 , which is in all current versions of PHP. If you aren't familiar with reading C code, it can be extremely hard to follow along with that bug report especially since PHP streams behind-the-scenes are ugly beasts to try to wrap your head around (mine's still spinning and I wrote the bug report). In short, the problem is a combination of non-blocking mode with SSL sockets when calling SSL_write() with different pointers in 'ext\openssl\xp_ssl.c'. The temporary patch in userland is to disable non-blocking mode when writing data - if you can - I'm not so sure I can/should. The correct solution is to fix PHP itself by altering how it interfaces with OpenSSL, which could be as simple as altering a couple of lines of code. I'd su...

PHP-FIG, Composer, and other disasters from the last decade of PHP

Let's talk about PHP. The scripting language , not the health insurance . PHP is, in my opinion, one of the greatest development tools ever created. It didn't start out that way, which is where most of its bad rap comes from , but it has transformed over the past decade into something worth using for any size project (and people do!). More specifically, I've personally found PHP to be an excellent prototyping and command-line scripting tool. I don't generally have to fire up Visual Studio to do complex things because I have access to a powerful cross-platform capable toolset at my fingertips. It's the perfect language for prototyping useful ideas without being forced into a box. BUT! Some people WANT to force everyone into a box. Their box. Introducing the PHP-Framework Interop Group or PHP-FIG. A very professional sounding group of people. They are best known as the folks who produce documents called PHP Standard Recommendations aka PSRs. This group...

Need a random TCP port number for your Internet server application?

When writing a TCP server, the most difficult task at the beginning of the process is deciding what port number to use. The Transmission Control Protocol has a range of port numbers from 0 to 65535. The range of an unsigned short integer (2 bytes). In today's terms, that is a fairly small space of numbers and it is quite crowded out there. Fortunately, there are some rules you can follow: Specifying port 0 will result in a random port being assigned by the OS. This is ideal only if you have some sort of auto-discovery mechanism for finding the port your users are interested in (e.g. connecting to a web server on port 80 and requesting the correct port number). Otherwise, you'll have to occupy an "open" port number. The first 1023 port numbers are reserved by some operating systems (e.g. Linux). Under those OSes, special permissions are required to run services on port numbers under 1024. Specifically, the process either has to have been started by the ...

2015 Annual Task List Statistics

At the end of last year, I decided to start collecting some statistics about my ever-changing software development task list. To do that, I wrote a script that ran once per day and recorded some interesting information from my task list manager (a flat text file) and the number of open tabs in Firefox. What follows are some interactive (oooooh, shiny!) charts and some analysis: The number of tasks on my task list peaked twice this year at 78 tasks and dropped one time to 54 tasks. The number of tasks appears to be decreasing according to the trend line in the first chart. However, the second chart tells a slightly different story. Even though the number of tasks is on the decrease, the file size of the text file in which the tasks are stored is apparently on the increase. This tells me that the overall complexity of each individual task is slightly higher or I'm just slightly better at documenting details so I don't forget what the task entails (or some combinatio...

Why developers should do their own documentation and code samples

I was recently on the Microsoft Developer Network website (aka MSDN) looking at some API documentation. Many of the more popular APIs have code examples so the developer can see example usage rather than have to try to understand every nuance of the API before using it. The particular API that I was looking to use had an example, so I made the unfortunate decision to look at the code. The example was a turd. It wasn't a polished turd. It was just a normal, run-of-the-mill turd. The code had HANDLE leaks, memory leaks, and a bunch of other critical issues. It looked like it was written by a 20 line Norris Number programmer (aka newbie). Being rather bothered by this, I set out to learn how Microsoft produces its code samples. According to one source I found, the company hands the task off to interns. So, sample code that a whole bunch of other programmers are going to simply copy-pasta into their own code is being written by amateur programmers. Nothing could possibly go...

Let's NOT Encrypt - Critical problems with the new Mozilla-sponsored CA

Starting a new Certificate Authority is a time-consuming, expensive, and difficult task. It is also annoying to set up and maintain SSL/TLS certificates. So I completely understand what Let's Encrypt is trying to do. Their goal? Free, functional SSL/TLS certificates that are easy to create, install/deploy, and even keep up-to-date. What's not to like about that? Well, it turns out there are some serious problems with this up-and-coming Certificate Authority (CA). I'm going to list the issues in order of concern: Doesn't solve the problems of storing roots in the browser or global trust issues. A U.S.-based company. Browser support/acceptance. Sponsored by Mozilla. Other, publicly traded, corporate sponsors. A brand-new, relatively untested, and complex issuance protocol (ACME). Limited clients (Python bindings only) and no libraries. Linux only. Each of these issues in detail: For the first issue, even though it is all we have got (Update Aug 2017...