Saturday, August 13, 2016

Shipping a gift? Nifty ideas for online delivery

Everyone loves receiving gifts. It's the thought that counts! Or something like that. However, most families these days tend to spread out across the country and so we end up shipping gifts to each other. But they are delivered in a boring brown box with a generic label on them where the return address is Amazon or another online store. And they might have ordered something else from Amazon or that online store too. And therefore they might open the gift before they are supposed to.

It turns out that there is a simple solution that is pretty cool. You can't get away from the boring brown box but you CAN do something about the shipping label on the boring brown box. Address labels typically look like:

First & Last Name
Address line 1
Optional address line 2
City, State ZIP, Country

I've bolded the parts that same-country shipping companies actually pay attention to. Let's hack the address label to make it do something useful for us that won't annoy shipping companies (too much). The first line is the most hackable/flexible and here is the first thing you can try:

GIFT FOR {Recipient's first and last name} FROM {Your first name}

That's for delivery to the non-tech-savvy grandma. However, we're just getting started. For tech-savvy family members, you can do something much cooler. Since there are approximately 35 characters in the available space, we can also do this for the first line:

VISIT TINYURL {URL shortener code}

Which can then direct the recipient anywhere: A YouTube or Vine video, a funny image, or a custom website. Once you've convinced the recipient to open their web browser or a mobile app, your imagination is the limit! This also opens up quite a few programming opportunities. For example, the target host could have a custom API that emits a JSON object containing a bunch of extra information, which would allow it to be paired with an app that can provide a more media-rich experience beyond what a web browser could provide. And that's just getting started. A package and the gift inside is really both a conversation starter and something memorable just waiting to happen.

The URL shortener hack works today, as-is, with quite a few online stores including the major ones like Amazon, eBay, ThinkGeek, etc. If an online store has split the first and last name field into two separate fields, use "VISIT" for the first name and the rest of the string for the last name. The shipping company won't really care. Ideally, someone will come up with a way to have this work WITH the shipping companies so that if they really want to know the real name of the recipient (e.g. for insurance purposes), they can easily get at that information.

Note that a lot of address labels only have uppercase letters. Therefore, mixed-case and non-alphanumeric shortened URL codes probably won't work with this package hack. The only real downside to using a URL shortener is that anyone who handles the package can access the shortened URL and therefore visit the target URL. If you care about privacy (and you should), there are password protected URL shorteners out there (e.g. thinfi). Send the password to the recipient via e-mail (or to an app!) and the URL shortener code with their package. That way no one visits the target URL until the package arrives at its destination.

Wednesday, June 22, 2016

Elegant iptables rules for your Linux web server

You've just upgraded to your first VPS or dedicated server and you think you've got all the software bits in place (LAMP stack and all that) OR you've moved from a hosting provider with easily configured hardware firewalls. Now you've read something somewhere that says you need 'iptables' rules with your new host. If you have any friends who manage Linux systems, you've also heard that "iptables is hard." In my experience, the only thing hard about iptables is that no one seems to publish decent rulesets and users are left to figure it all out on their own. It doesn't have to be that way!

Ubuntu/Debian: apt-get install iptables-persistent

RedHat/CentOS: chkconfig iptables on

That installs the iptables persistent package/enables storing iptables so that they load during the next boot.

Those editable configuration files are where the IPv4 and IPv6 iptables rules are stored respectively and are loaded from on boot with the previous bit. Here is a good set of starter IPv4 rules:
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp --syn --dport 80 -j ACCEPT
-A INPUT -p tcp --syn --dport 443 -j ACCEPT
-A INPUT -p tcp --dport 22 -j ACCEPT
-A INPUT -p icmp --fragment -j DROP
-A INPUT -p icmp --icmp-type 3 -j ACCEPT
-A INPUT -p icmp --icmp-type 4 -j ACCEPT
-A INPUT -p icmp --icmp-type 8 -j ACCEPT
-A INPUT -p icmp --icmp-type 11 -j ACCEPT
And a good set of starter IPv6 rules:
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp --syn --dport 80 -j ACCEPT
-A INPUT -p tcp --syn --dport 443 -j ACCEPT
-A INPUT -p tcp --dport 22 -j ACCEPT
-A INPUT -p icmpv6 -j ACCEPT
You should run:
ifconfig -a
To figure out what interface is the local loopback interface. The rules above default to the 'lo' interface, which is probably correct unless you've got a weird host.

After that, you should change the rules to reflect the ports that you need open. To determine what ports are currently open, you can run:
netstat -plntu | grep -v | grep -v ::1: | grep -v dhclient | grep -v ntpd
That set of commands returns all running TCP/UDP servers that are not exclusively localhost and aren't the standard DHCP client or the NTP daemon (by the way, you should have ntpd installed to avoid severe clock drift). That is, it will show all the ports that probably need to be firewalled properly. Use Google to search for any port numbers you don't recognize. (Hint: Port 22 is SSH/SFTP - it's included above and you probably want to leave that rule alone!) For each port you decide to allow, adjust the rules accordingly - probably by adding new lines that mostly mirror other lines except the --dport option will be different.

After the TCP rules, you should put any UDP rules you need. Since UDP is generally rarer to see except if you are hosting a multimedia or game server, I didn't include any above, but they look like this:
-A INPUT -p udp --dport 2933 -j ACCEPT
Just replace the 'tcp' bit with 'udp' and drop the --syn option. Keep in mind that a lot of mobile technology (e.g. smartphones) don't support UDP over wireless networks. To accommodate mobile devices, it is a good idea to enable TCP mode alongside any UDP servers and set up firewall rules for both.

Once you are ready to fire up the new rules, run commands similar to these:

iptables-restore < /etc/iptables/rules.v4
ip6tables-restore < /etc/iptables/rules.v6
iptables-restore < /etc/sysconfig/iptables
ip6tables-restore < /etc/sysconfig/ip6tables
That's it! You are now a master of iptables rules. And it was just as easy to set up, if not easier than, Ubuntu ufw or other system-specific solutions!

Let's say you get it in your head that you want to restrict access to a single IP address or an IP address range. IMO, if you can, leave your clean and elegant rules as-is and use either the Web Knocker Firewall Service or fail2ban instead of messing around with iptables. For static IP addresses that will never, ever change (really?) you can use the --src option (e.g. -A INPUT -p tcp --dport 22 --src -j ACCEPT) but don't do that unless you really know what you are doing.

One other thing to consider doing is to make changes to your kernel and network stack. The file to edit is /etc/sysctl.conf and here are some relevant options (read the Internets before making such changes):
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv6.conf.all.accept_source_route = 0
The rest of this post is a quick overview of how the iptables rules work. The default policies of iptables is ACCEPT with no rules, which means all packets are accepted. So the first thing that happens in the rules is to switch both INPUT (packets coming in) and FORWARD (only relevant for redirecting packets - e.g. a router) policies to DROP (i.e. ignore all packets). The OUTPUT (packets headed out) policy is left as ACCEPT (i.e. accept all outbound packets). In my experience, there's never a valid reason to switch OUTPUT to DROP unless you actually want to create headaches for yourself. Keep it simple. Keep it elegant.

Now that we're past the policies, let's look at the rules themselves. The first rule says to have the 'state' kernel module for iptables check to see if an incoming packet is part of a RELATED or ESTABLISHED connection. If so, ACCEPT it and skip the rest of the rules. This is a great rule to have first because nearly all packets will hit this rule and immediately pass through the firewall. It's performance-friendly! It also shows that the ordering of the rules can be quite important for maximizing system performance.

The next rule lets all new connections to all ports on the 'lo' interface (localhost) through. Again, another nice, performance-friendly rule. After that, new connections to TCP ports 80, 443, and 22 are let through. The --syn option checks TCP flags for a valid SYN packet. Since most port 22 connections are extremely long-lived and, depending on the client, --syn might cause frequent disconnects, it is excluded from the rules.

After the TCP and optional UDP rules are the rules for ICMP packets. For IPv4, I drop fragmented ICMP packets since those types of packets are only ever used in a Denial of Service attack. ICMP types 3 and 4 are essential/required, type 8 is for ping (optional), and type 11 is for traceroute (also optional). IPv6 utilizes ICMP heavily, so blocking ICMPv6 traffic is currently considered bad practice. I've also not seen any particular firewall rulesets worth using for more strict ICMPv6 that don't look overly complicated. So I'm simply accepting all ICMPv6 traffic until someone points out issues with doing so (e.g. a confirmed CVE).

The last line simply COMMITs all of the the changes and enables the new rules. If a rule fails to apply for some reason, iptables-restore will roll back all of the changes since the last COMMIT line. This is a really nice feature because you don't want to get through half of the rules, encounter an error, and be locked out of the system.

By the way, Linux nerds, did you see how easy this was? This is totally what you should be doing. Useful things first such as complete iptables rulesets that people can copy and paste. Then slightly less useful, more technical things after that such as describing how those rules work for the people who really want to know.

Saturday, June 04, 2016

The most interesting bug in PHP

The most interesting bug in PHP is the showstopper bug in the core of PHP you finally run into after a month of software development just as you are getting ready to ship a brand new product out the door. Specifically, PHP bug #72333, which is in all current versions of PHP. If you aren't familiar with reading C code, it can be extremely hard to follow along with that bug report especially since PHP streams behind-the-scenes are ugly beasts to try to wrap your head around (mine's still spinning and I wrote the bug report). In short, the problem is a combination of non-blocking mode with SSL sockets when calling SSL_write() with different pointers in 'ext\openssl\xp_ssl.c'.

The temporary patch in userland is to disable non-blocking mode when writing data - if you can - I'm not so sure I can/should. The correct solution is to fix PHP itself by altering how it interfaces with OpenSSL, which could be as simple as altering a couple of lines of code. I'd submit a patch, but I'm not entirely sure what the correct course of action should be since the problem happens so deep in the code and even my suggested fix might cause the more common case (i.e. blocking sockets) to break. It's kind of rare to need the ability to write tons of data to non-blocking SSL sockets in PHP, so it is not surprising that very few people have run into the issue.

Once you've started reading the actual C source code to PHP, it becomes rather frustrating to see how few people actually read the source code to PHP. This is no more self-evident than the comments section on every documentation page on, GitHub, Stack Overflow, forums, and mailing lists where people make uninformed guesses and subsequently pollute issue trackers and Google search results. I blame a combination of laziness and...wait, no, it's pretty much laziness. You can actually download the source code to PHP here [mind blown]. Instead of just blindly compiling and running PHP, you can actually read the source code [mind blown again].

Of course, that doesn't mean the PHP source code is easy to follow - it is written in C and 80% of the code is basically a hodgepodge of horribleness that exists to deal with cross-platform and third-party library integration issues and various bits of ancient cruft that have stuck around from the very beginning of the language. It would probably look a lot cleaner though if the PHP documentation itself linked to the source code (I opened that ticket too but missed proofreading one sentence - sigh). After all, most people tend to spruce things up when they know guests are coming over to visit.

Saturday, April 23, 2016

PHP-FIG, Composer, and other disasters from the last decade of PHP

Let's talk about PHP. The scripting language, not the health insurance. PHP is, in my opinion, one of the greatest development tools ever created. It didn't start out that way, which is where most of its bad rap comes from, but it has transformed over the past decade into something worth using for any size project (and people do!). More specifically, I've personally found PHP to be an excellent prototyping and command-line scripting tool. I don't generally have to fire up Visual Studio to do complex things because I have access to a powerful cross-platform capable toolset at my fingertips. It's the perfect language for prototyping useful ideas without being forced into a box.

BUT! Some people WANT to force everyone into a box. Their box. Introducing the PHP-Framework Interop Group or PHP-FIG. A very professional sounding group of people. They are best known as the folks who produce documents called PHP Standard Recommendations aka PSRs. This group of 20 or so people from a wide-range of very popular projects have gotten together to try to work out some of the problems they have encountered when working with PHP. Their goal is simple:
"The idea behind the group is for project representatives to talk about the commonalities between our projects and find ways we can work together. Our main audience is each other, but we’re very aware that the rest of the PHP community is watching. If other folks want to adopt what we’re doing they are welcome to do so, but that is not the aim. Nobody in the group wants to tell you, as a programmer, how to build your application."
No, "We'll just let everyone else tell you how to build your application." At least that's the implication and it certainly is what seems to be happening.

There's nothing wrong with having Standards. In fact, I'm a strong advocate of them. What I'm NOT an advocate of is being told that my code has to be written a specific way by clueless people who blindly follow PHP-FIG PSRs without understanding where they are coming from. The worst offender is basically everyone in the Composer camp. In software development, the more dependencies you have, the more likely it is that your project will break in spectacular ways. And, as we all know, everything breaks at the most inopportune times. Composer takes that concept to its extreme conclusion and introduces the maximum amount of dependencies into your software project all at once. No thank you very much. Correct software development attempts to reduce dependencies to the bare minimum to avoid costly breakages.

Composer exists because PSRs and lazy programmers who don't know how to develop software exist. PSRs exist because PHP-FIG exists.

The worst PSR in PHP-FIG is PSR-4, formerly PSR-0: The autoloader. As hinted by the zero (0) in "PSR-0", it was the first accepted "Standard" by PHP-FIG - and I use the word Standard loosely here. The concept of the autoloader stems from a very broken portion of PHP known as a namespace. In most normal programming languages that implement namespaces, the idea is to isolate a set of classes or functions so they won't conflict with other classes and functions that share the same name. Then the application developer can choose to 'import' (or 'use') the namespace into their project and the code compiler takes care of the rest at compile-time - all the classes and functions of the whole namespace become immediately available to the code.

That sounds great! So what could possibly go wrong?

In PHP, however, namespaces were only halfway implemented. PHP developers have to declare, up front, each class they want to 'use' from a namespace to simplify later code AND manually load each file that contains the appropriate class. This, of course, created a problem - how to get the files to load that contain the code for the class without writing a zillion 'require_once' lines? Instead of correctly implementing namespaces and coming up with a sane solution, a hack was developed known as __autoload() and later became a formalized hack known as spl_autoload_register(). I call it a hack because the autoloader is effectively an exception handler for a traditional code compiler - something no one in their right mind would ever write. With an autoloader, at the very last moment before PHP would throw up an error about a missing class, the autoloader catches the exception and tells PHP, "Oh never mind about that, I got it." Thinking about all of the backend plumbing required to make THAT nonsense happen (instead of correctly implementing namespaces in PHP) makes my head hurt.

Exception handlers, when written correctly, do nothing except report the exception upstream and then bail out as fast as possible from the application. Exceptions happen when an unrecoverable error condition occurs. Good developers don't try to recover from an exception because they realize they are in a fatal, unrecoverable position. (This is why Java is fundamentally broken as a language and a certain company that shall not be named made many terrible decisions to ultimately select Java as their language of choice for a certain popular platform that shall also not be named.)

Instead of fixing the actual problem (i.e. broken namespace support), us PHP userland developers get the autoloader (i.e. a hack). Composer and its ilk then builds upon the broken autoloader concept to create a much larger, long-term disaster: Shattered libraries that have dependencies on project management tools that someone may or may not want to use (Hint: I don't) and dependencies on broken implementations of certain programming concepts that should be fixed (i.e. PHP namespaces. By the way, don't use things that are broken until they have been fixed - otherwise you end up with hacks).

Another problem lies in the zillions of little files that PHP-FIG PSRs have directly resulted in (e.g. insane rules like "every class MUST be in its own file"), which results in huge increases in upload times to servers over protocols like SFTP (and FTP). What is known as a "standalone build" is pretty rare to see these days. A standalone build takes all of the relevant files in a project and merges them into one file. A good standalone build tool also allows users to customize what they receive so the file doesn't end up having more bloat than what they actually need.

Congratulations PHP-FIG: You've successfully exchanged one problem (i.e. poorly written classes) for a different problem (i.e. poorly written classes spanned across hundreds of files with massive, unnecessary dependencies that take forever to upload and rely on broken-by-design non-features of PHP).

Friday, April 22, 2016

Need a random TCP port number for your Internet server application?

When writing a TCP server, the most difficult task at the beginning of the process is deciding what port number to use. The Transmission Control Protocol has a range of port numbers from 0 to 65535. The range of an unsigned short integer (2 bytes). In today's terms, that is a fairly small space of numbers and it is quite crowded out there. Fortunately, there are some rules you can follow:

  • Specifying port 0 will result in a random port being assigned by the OS. This is ideal only if you have some sort of auto-discovery mechanism for finding the port your users are interested in (e.g. connecting to a web server on port 80 and requesting the correct port number). Otherwise, you'll have to occupy an "open" port number.
  • The first 1023 port numbers are reserved by some operating systems (e.g. Linux). Under those OSes, special permissions are required to run services on port numbers under 1024. Specifically, the process either has to have been started by the 'root' user OR been declared okay by the 'root' user with special kernel commands (e.g. setcap 'cap_net_bind_service=+ep' /path/to/program).
  • Of the remaining port numbers, huge swaths of the space have been used for what are known as Ephemeral Ports. These are randomly selected and used for temporary connections. 1024-5000 have been used by a number of OSes. IANA officially recommends 49152-65535 for the Ephemeral Ports.
  • Port 8080 is the most common "high" port that people use (i.e. alternate web server port). Avoiding that is a good idea.
Having all of that knowledge is useful but what you need is a random number generator that can generate that port number for you. Without further ado, your new port number awaits your clickity click or tappity tap:

Get a random TCP port number

As long as it doesn't output 8080, you are good. If it does output 8080 and/or stomping on IANA assignments bothers you, reload the page and try again.

Friday, January 01, 2016

2015 Annual Task List Statistics

At the end of last year, I decided to start collecting some statistics about my ever-changing software development task list. To do that, I wrote a script that ran once per day and recorded some interesting information from my task list manager (a flat text file) and the number of open tabs in Firefox. What follows are some interactive (oooooh, shiny!) charts and some analysis:

The number of tasks on my task list peaked twice this year at 78 tasks and dropped one time to 54 tasks. The number of tasks appears to be decreasing according to the trend line in the first chart. However, the second chart tells a slightly different story. Even though the number of tasks is on the decrease, the file size of the text file in which the tasks are stored is apparently on the increase. This tells me that the overall complexity of each individual task is slightly higher or I'm just slightly better at documenting details so I don't forget what the task entails (or some combination of both).

The final chart is probably the most interesting and perhaps the most telling. It shows how many open tabs I have in Firefox. Firefox is my primary web browser in which I do all of my research for my software development. I tend to close a bunch of tabs around the time I release a new version of my software. As a result, I figured it would be a good measure of my development habits. During the early part of the year, I got the number of open tabs down to 70. And that's the lowest it went. At the early part of December, however, the number of open browser tabs dramatically spiked to 244 and dropped to 90 open tabs just a couple of weeks later. If you look at my forum activity around the time the tabs dropped, you see a correlation to when I teased a new piece of software and how much I really dislike certain aspects of the Windows API. About 150 browser tabs were open for various bits of information to help construct a brand new piece of software. The overall trend line for browser tabs is, of course, on the increase. I have a feeling, based on the official 2016 CubicleSoft project list, that the increasing line will remain the trend.

There are other drops in task and tab counts that correlate to various software releases. I'm personally most interested in overall trends. I really want to see the number of tasks diminish over time. I'd like to see complexity on the decrease too. It would be awesome to completely wipe out all of my browser tabs. None of those are particularly realistic, but I can dream. There are projects I need to do to get to a point where I feel like software has stabilized.

(By the way, I'm aware the spreadsheet that the data is in is public. There's not much else in the data beyond what is seen in the charts but maybe someone will come up with some additional and interesting anecdotes.)

Saturday, November 21, 2015

Why developers should do their own documentation and code samples

I was recently on the Microsoft Developer Network website (aka MSDN) looking at some API documentation. Many of the more popular APIs have code examples so the developer can see example usage rather than have to try to understand every nuance of the API before using it. The particular API that I was looking to use had an example, so I made the unfortunate decision to look at the code. The example was a turd. It wasn't a polished turd. It was just a normal, run-of-the-mill turd. The code had HANDLE leaks, memory leaks, and a bunch of other critical issues. It looked like it was written by a 20 line Norris Number programmer (aka newbie).

Being rather bothered by this, I set out to learn how Microsoft produces its code samples. According to one source I found, the company hands the task off to interns. So, sample code that a whole bunch of other programmers are going to simply copy-pasta into their own code is being written by amateur programmers. Nothing could possibly go wrong with that. If the examples are indeed written by interns, it certainly explains why the quality of the code samples in the documentation is all over the map ranging from really bad to barely passable. It's certainly not what I would expect from a professional organization with 50,000 employees. If you open a HANDLE, close it. Allocate memory? Free it. Simple things that aren't hard to do but help achieve a level of professionalism because you know that other people are just going to copy the example into their code, expect it to work, and not have unforeseen bugs in production.

MSDN is the face of Microsoft most people don't really get to see unless they start developing for the Windows OS. But it matters who produces the documentation because a single mistake is going to affect (tens/hundreds of) thousands of applications and millions (billions?) of people. API documentation is almost always too intricate for most other developers to fully understand. While it is the be-all-end-all definitive overview of any give API call, code examples provide context and meaning. A lot of people struggle with "so if I use this API, what do I do next" but have the "aha!" moment when they see a working example connecting the API to other code. Developers will copy and paste an example long before they fully comprehend any given API. For this reason, code examples need to have the same care and professionalism applied to them as the API itself. Passing this responsibility off to an intern is going to create significant long-term problems.

Writing your own code examples for an API also has the benefit of revealing bugs in the API. If the developer who made the API is writing the documentation for it and the code sample, they are 15 zillion times more likely to spot mistakes and correct them before they get released into the wild. Pass that responsibility off to an intern? Well, the intern is going to not run into or just ignore the bugs in the API because THEY DON'T CARE. They want the paycheck and the checkmark on their graduation forms that says they did their internship. Users (developers) have to live with the disaster that interns leave behind in their wake. Putting them on documentation and code example writing tasks means interns will be the face of the company that developers (i.e. the people who matter the most) will see. That strikes me as unprofessional.

In short: Develop an API? Do your own documentation and code sample writing. Is it tedious and boring? Yes. But it is important to do it anyway. In fact, it is infinitely more important than the API you wrote.