Skip to main content

Posts

The Star Trek stardate for a better UNIX timestamp

A large body of software will break in unusual ways on January 19, 2038. This is known as the UNIX 32-bit timestamp bug. However, UNIX-style timestamps are used in all OSes, so it is a global phenomenon and a poorly designed software defect. 64-bit timestamps merely extend the problems presented by 32-bit timestamps to identify what day it is (i.e. what the end-user actually cares about). Perhaps there is a better timestamp we should have been using all along: Star Trek stardates.

Formal representations:

32-bit: 1 sign bit + 14 bit "day" + 17 bit percentage
64-bit: 1 sign bit + 31 bit "day" + 32 bit percentage

If we were using Star Trek-style stardates for date/time storage, our timestamps would have better precision and the upcoming 32-bit software problems would have happened early 2014 instead of waiting until 2038 for the breakages to happen. That is, force the people who created the problem to clean up their mess instead of letting them retire and h…
Recent posts

Secure web server permissions that just work

I have been doing web development since, well, web development basically began. And I've used a wide range of hosts. Since I don't see anyone stating answers succinctly and definitively anywhere, it is time to write a solution to the question on everyone's mind: What are the permissions that I should set for web server directories and files?

The first step is to identify the user that the web server will access files with/run under. For example, many Linux distributions set up 'www-data' as the user. I'll be focusing mostly on Linux as it powers about 66% of the Interwebs, but Windows Server users can benefit too.

It is important to get your setup correct from the very beginning. Propagating permissions down the website tree as new directories and files are created is critical to maintaining sanity. Knowing who created a specific file or directory is also important when working in a team. As always, if you can't trust other users who might have ac…

DNS cache spoofing/poisoning is useful for web developers

When most people hear the word "poison" they immediately conjure up bad things in their mind from some weird crime drama that they watch on TV. DNS cache poisoning (or spoofing) is generally considered a bad thing because it means that a domain name is resolved to the "wrong" IP address. It is usually used in terms of an attacker that gains access to a DNS host to deliver the wrong responses to DNS requests or intercepts and alters responses to requests, which then points the client at the wrong IP address.

DNS cache poisoning, however, can be used for a few positive, legitimate things. Let's say you want to relaunch a website on a different web host. To do this, you could develop it locally and then upload the files when you are finished to the new host and switch DNS over and watch it break spectacularly. But if you want to get a relaunch 95% right, you need to see the new website before DNS is switched over. To do this, DNS cache poisoning comes to th…

WTFPL is harmful to software developers

Occasionally, I run into a piece of software that utilizes an inappropriate license with a crude title. Today I want to talk about one of those licenses. It is called the WTFPL and it is harmful to any software developer that uses it.

I don't use foul language even among impolite company, so I'm not going to copy the license text here. You can read it if you want but it isn't necessary. There are about 300,000 words in the English language at any given time. Of those, about 200 words are considered to be rude, crude, foul, and generally inappropriate to use in most settings. The words a person chooses to use in casual conversation says a lot about them.

Language issues aside, the basic gist of the WTFPL license says that you can do whatever you want with the software that the license is associated with. If you look at a traditional software license (aka EULA) with its many pages of text and the various "license wars" out there, the idea behind the WTFPL …

Virtual Private Servers (VPS) and Cloud hosting are now viable

For many, many years, I was a massive fan of dedicated web hosting. I was VERY vocal about how you couldn't run a legitimate, professional business without using dedicated web hosting. And time and time again, I was proven right as people on shared web hosting came out of the woodwork in various places who had bet their business on shared hosting and lost - and sometimes they lost EVERYTHING including their business and all their customers!

Shared web hosting is still the bottom of the barrel, scummy/scammy money grab that it has always been and no respectable business should be caught dead running their web infrastructure on it. Period. That hasn't changed.

However, I have been watching a couple of new stars grow from infancy into its own over the past 8 years: Virtual Private Servers, aka VPS, and its newer, shinier cousin Cloud Hosting.

Dedicated web hosting is expensive. It has always been because you get a piece of hardware, a network drop, electricity, a transfe…

You can still buy a brand new Dot Matrix printer...

Today, I learned that people still buy brand new dot matrix printers. You know, those extremely noisy printers I thought we ditched as soon as it was possible to do so. Well, except for the nutcases who turn them into "musical instruments" and start a YouTube channel:



But, no, sales of brand new(!) dot matrix printers are apparently still, relatively-speaking, alive and well:

Dot matrix printers on Newegg

After doing some research, it turns out that, for bulk printing where output quality and "professional" appearance doesn't matter at all, dot matrix printers can be anywhere from 4 to 8 times cheaper than laser printers per printed page (the next cheapest technology) when amortized over the cost of maintenance of the lifetime of each type of printer. With dot matrix, you're not going to get the speed, accuracy, or the quietness of laser, but you'll supposedly save a boatload of money on toner.

Maybe one day we will get a printer that combines the …

Bulk web scraping a website and then reselling the content can land you in legal hot water

This interesting article on web scraping just came to my attention:

New York Times: Auction Houses Face Off in Website Data Scraping Lawsuit

Summary: An auction house in New York is suing an auction house in Dallas for copyright law violations regarding scraping the New York auction house's website listings including their listing photos and then SELLING those listings and photos in an aggregate database for profit.

As I'm the author of one of the most powerful and flexible web scraping toolkits (the Ultimate Web Scraping Toolkit), I have to reiterate the messaging found on the main documentation page: Don't use the toolkit for illegal purposes! If you are going to bulk scrape someone's website, you need to make sure you are free and clear legally for doing so and that you respect reasonable rate limits and the like. Reselling the data acquired with a scraping toolkit seems like an extremely questionable thing to do from a legal perspective.

The problem with bul…

Setting up your own Root Certificate Authority - the right way!

Setting up your own Root Certificate Authority, aka Root CA, can be a difficult process. Web browsers and e-mail clients won't recognize your CA out-of-the-box, so most people opt to use public CA infrastructure. When security matters, using a public CA is the wrong solution. Privately owned and controlled CAs can be infinitely more secure than their public counterparts. However, most people who set up a private CA don't set up their CA infrastructure correctly. Here is what most private CAs look like:

Root CA cert -> Server cert

This is wrong because the server certificate has to be regenerated regularly (e.g. annually). If the root certificate is compromised, then it involves fairly significant effort to replace all of the certificates, including the root. What should be built is this:

Root CA cert -> Intermediate cert -> Server cert

In fact, this is the format that most public CAs use. The root CA cert is generated on a machine that isn't connected to …