Thursday, December 22, 2016

Virtual Private Servers (VPS) and Cloud hosting are now viable

For many, many years, I was a massive fan of dedicated web hosting. I was VERY vocal about how you couldn't run a legitimate, professional business without using dedicated web hosting. And time and time again, I was proven right as people on shared web hosting came out of the woodwork in various places who had bet their business on shared hosting and lost - and sometimes they lost EVERYTHING including their business and all their customers!

Shared web hosting is still the bottom of the barrel, scummy/scammy money grab that it has always been and no respectable business should be caught dead running their web infrastructure on it. Period. That hasn't changed.

However, I have been watching a couple of new stars grow from infancy into its own over the past 8 years: Virtual Private Servers, aka VPS, and its newer, shinier cousin Cloud Hosting.

Dedicated web hosting is expensive. It has always been because you get a piece of hardware, a network drop, electricity, a transfer limit, a SLA (e.g. 99.9% uptime guaranteed), and a contract. On shared hosting, you can't do whatever you want and will be given the boot if you try to do much of anything with it. However, in the case of dedicated hosting, the CPU, RAM, and hard drive are yours to do whatever you want with it. Entry level dedicated servers start around $60/month for a 1 to 2 year contract period and rapidly go up to hundreds of dollars per month for more beefy hardware. But you can run as many websites on a single piece of hardware that you are comfortable with running on that one server. If you are looking for dedicated hosting, I still recommend 1&1 Dedicated Hosting.

Virtual Private Servers (VPS) have generally been a cheaper option, but with the serious caveat that you can't do much with them. It's a blend between shared hosting and dedicated hosting. You get a mostly isolated OS instance (inside a virtual machine) but you share the same physical hardware with other virtual machines. Most VPS providers charge about half the cost of a dedicated server with a fraction of the CPU and RAM. The OS that runs on a VPS shares I/O resources with other virtual machines on the same host. That is still a problem but is being worked on. CPU cores, RAM, and network bandwidth are isolated these days but I/O requests to physical media (i.e. hard drives/SSD) are not. This means that one virtual machine can still potentially starve other machines on the same host. It's a problem still being worked on. Many VPS providers have also moved to using SSD instead of hard drives, which helps with reducing I/O overhead time. For consumers, most VPS providers simply aren't cost effective - the amount of hardware (CPU, RAM, storage), or lack thereof, is usually the bottleneck - and so most web sites won't run on most VPS infrastructure very well. Some early adopter VPS providers started offering expandable VPS solutions after a while, which are the precursors to Cloud hosting options.

Cloud hosting is the newer shinier kid on the block. When it first came out, it was generally more expensive than dedicated servers. I kept an eye on it but pretty much wrote it off as a toy that would take years to mature. The idea is simple - separate hardware interests so that data can be floated around and attached and detached at will and massively replicate data around the globe so that it is readily accessible at the closest point to the user. In addition, additional hardware resources can be attached and detached at will and/or migrated as usage rises and falls. Implementation of that is difficult to achieve and the tools to build and maintain that sort of infrastructure at first didn't really exist. The tools eventually were developed and have matured over many years (as I predicted) and Cloud hosting has also subsequently matured. It is still more expensive to deploy than a VPS and can still be more expensive than dedicated hosting.

So, why write this post? As my dedicated hosting contract reached its end of life this year, I started looking around at my options. I was quite aware of Digital Ocean, which has an amazing programmable API for spinning up and down instances (Droplets) and doing all sorts of crazy things with virtual machines. After a lot of research and personal fiddling around with their API, I have more or less decided that Digital Ocean is only good for temporary, toy instances where you've got an idea you want to try out before deploying it for real. The general consensus I've seen in the larger community is that people shouldn't try running a real website on Digital Ocean or Amazon AWS. Amazon AWS, while similar, is also more expensive than Digital Ocean and both AWS' cost calculators and Amazon's horrible, convoluted Console, API, and SDK will drive you up the wall.

Then, after a lot more searching, I finally discovered OVH VPS and OVH Cloud Hosting. For ~$13.50/month, a fraction of the cost to get the same setup elsewhere in VPS land and beating some low-end dedicated hosting hardware-wise, OVH provides a fully functional 2 core VPS with 8GB RAM and enough monthly transfer for most businesses. Their offerings here, hands-down, absolutely crush Digital Ocean - the importance of having enough RAM overhead to actually do things can't be overemphasized! In addition, for half the cost of a low-end dedicated server, OVH's lowest end Cloud hosting option completely blows most low-end dedicated hosting out of the water in terms of hardware specs and scalability readiness. I honestly don't know how they are managing to do that and still turn a profit - based on some recent-ish server blades I've seen pop up, I have a few ideas but, even then, margins per blade are thin. The ONLY downsides to using OVH is that they don't have automatic billing/renewal capabilities for their VPS and Cloud hosting options and I also had to come up with an alternative to my previous firewall solution. The Canadian company has been around for so long that their payment system still uses CGI scripts to process payments (there's an early 2000's throwback for you). OVH should scrap their current payment system and use Stripe for a flexible PCI compliant payment solution, which also happens to be what Digital Ocean currently uses.

In short, I stopped using 1&1 at the end of my contract period and have been quietly using OVH for many months now. My costs are significantly reduced but the hardware isn't as robust as before (to be expected - I went from an 8 core dedicated to a 2 core shared system) and not having automatic billing is a tad irritating. OVH makes it easy to renew, but no one should have to do manual renewals of a standard service for a wide variety of reasons - the least of which being that everyone else in the industry offers automatic renewals and they are the weird ones here. 1&1 has a configurable Cisco firewall for their dedicated server products that worked quite well - one of the reasons I stuck with them for so long. So I also now deploy and use good iptables rules and the Web Knocker Firewall Service for a powerful firewall combo that is basically fire-and-forget and superior to most firewall setups.

Update March 2017: I recently discovered that OVH VPS servers have an IP level firewall available. Using it supposedly also helps keep a VPS from bouncing onto and off of their DDoS infrastructure, which has some issues of its own. I still believe Web Knocker is a better solution for a more refined firewall, but having dedicated upstream hardware is a nice addition.

You can still buy a brand new Dot Matrix printer...

Today, I learned that people still buy brand new dot matrix printers. You know, those extremely noisy printers I thought we ditched as soon as it was possible to do so. Well, except for the nutcases who turn them into "musical instruments" and start a YouTube channel:



But, no, sales of brand new(!) dot matrix printers are apparently still, relatively-speaking, alive and well:

Dot matrix printers on Newegg

After doing some research, it turns out that, for bulk printing where output quality and "professional" appearance doesn't matter at all, dot matrix printers can be anywhere from 4 to 8 times cheaper than laser printers per printed page (the next cheapest technology) when amortized over the cost of maintenance of the lifetime of each type of printer. With dot matrix, you're not going to get the speed, accuracy, or the quietness of laser, but you'll supposedly save a boatload of money on toner.

Maybe one day we will get a printer that combines the best of all printing technologies in one compact, affordable device: Dot matrix, laser, inkjet, 3D, and a bunch of other print heads. Ideally a single device that won't care about and automatically adapt to the type of material being printed on, including bulky and strange shapes. And also doesn't fall apart after two months of use and doesn't cost an arm and a leg to maintain (inkjet printers - I'm looking at you). Basically, I'm asking for a StarTrek replicator.

I don't ask for much.

Tuesday, December 13, 2016

Bulk web scraping a website and then reselling the content can land you in legal hot water

This interesting article on web scraping just came to my attention:

New York Times: Auction Houses Face Off in Website Data Scraping Lawsuit

Summary: An auction house in New York is suing an auction house in Dallas for copyright law violations regarding scraping the New York auction house's website listings including their listing photos and then SELLING those listings and photos in an aggregate database for profit.

As I'm the author of one of the most powerful and flexible web scraping toolkits (the Ultimate Web Scraping Toolkit), I have to reiterate the messaging found on the main documentation page: Don't use the toolkit for illegal purposes! If you are going to bulk scrape someone's website, you need to make sure you are free and clear legally for doing so and that you respect reasonable rate limits and the like. Reselling the data acquired with a scraping toolkit seems like an extremely questionable thing to do from a legal perspective.

The problem with bulk web scraping is that it costs virtually nothing on the client end of things. Most pages with custom data these days are served dynamically to some extent. It takes not insignificant CPU time and RAM on the server side to build a response. If one bad actor is excessively scraping a website, it drags down the performance of the website for everyone else. If a website operator starts to rate limit requests, then they will run into additional technical issues (e.g. they'll end up blocking Googlebot and effectively delist themselves from Google search results).

The Ultimate Web Scraper Toolkit is capable of easily hiding requests in ways that mimic a real web browser or a bot like Googlebot. It follows redirects, passes in HTTP Referer headers, transparently and correctly handles HTTP cookies, and can even extract forms and form fields from a page. It even has an interactive mode that comes about as close to the real thing from the command-line that a person can get in a few thousand lines of code. I'm only aware of a few toolkits that can even come close to the capabilities of the Ultimate Web Scraper Toolkit, so I won't be surprised if it comes to light that what I've built has been used for illegal purposes despite the very clear warnings. With great power comes great responsibility and it appears that some people just aren't capable of handling that.

Personally, I limit my bulk scraping projects and have only gotten into trouble one time. I was web scraping a large amount of publicly available local government data for analysis and the remote host was running painfully slowly. I was putting a 1 second delay between each request, which seemed reasonable but apparently my little scraper project was causing serious CPU issues on their end and they contacted me about it. The correct fix on their end was probably to apply a database index but I wasn't going to argue for that as I had already retrieved enough data at that point anyway for what I needed to do and so I terminated the scraper to avoid upsetting them. Even though I was well within my legal rights to keep the scraper up and running, maintaining a healthy relationship with people who I might need to work with in the future is important to me.

Here's an interesting twist: Googlebot is the worst offender when it comes to web scraping. I've seen Googlebot scrape little rinky-dink websites at upwards of over 30,000 requests per hour! (And that's restricting looking at only requests in Googlebot's officially published IP address range in the server logs.) I'd hate to see Googlebot stats for larger websites. If you don't allow Googlebot to scrape your website, then you don't get indexed by Google. If you rate limit Googlebot, you get dropped off the first page in listing results on Google. Ask any website systems administrator and they will tell you the same thing about Googlebot being a flagrant abuser of Internet infrastructure. But no one is suing Google over the extreme abuse of our infrastructure because everyone wants/needs to be listed on Google so that various business operations function, but at least one auction house is happy to sue another auction house while, at the same time, they both are happy to let Googlebot run unhindered over the same infrastructure. I believe FCC Net Neutrality rules could be used as a defense play here to limit damages awarded - IMO, the Dallas auction house, if they did indeed do what was claimed, violated copyright law - reselling scraped photos is the worst part. Then again, so is Google and you simply can't play favorites under the FCC Net Neutrality rules. Those same listings are certainly web scraped and indexed by Google because they want people to find them, which means Google has similarly violated copyright law to be able to index the site. Google then resells those search results in the form of the AdWords platform and turns a profit at the expense of the New York auction house. No one's batting an eyelash at that legally questionable conflict of interest and the path of logic to this point is arguably the basis of a monopoly claim against Google/Alphabet. IMO, under FCC Net Neutrality rules, unless the New York auction house (Christie's) similarly sues Google/Alphabet, they shouldn't be allowed to claim full copyright damages from the Dallas auction house.

At any rate, all of this goes to show that we need to be careful as to what we scrape and we definitely don't sell/resell what we scrape. Stay cool (and legal) folks!

Friday, December 09, 2016

Setting up your own Root Certificate Authority - the right way!



Setting up your own Root Certificate Authority, aka Root CA, can be a difficult process. Web browsers and e-mail clients won't recognize your CA out-of-the-box, so most people opt to use public CA infrastructure. When security matters, using a public CA is the wrong solution. Privately owned and controlled CAs can be infinitely more secure than their public counterparts. However, most people who set up a private CA don't set up their CA infrastructure correctly. Here is what most private CAs look like:

Root CA cert -> Server cert

This is wrong because the server certificate has to be regenerated regularly (e.g. annually). If the root certificate is compromised, then it involves fairly significant effort to replace all of the certificates, including the root. What should be built is this:

Root CA cert -> Intermediate cert -> Server cert

In fact, this is the format that most public CAs use. The root CA cert is generated on a machine that isn't connected to any network. Then it is used to generate any necessary intermediate certs. Both the root and intermediates are signed for long periods of time - typically about 10-30 years for the root and 2-5 years for the intermediates. The root CA certificate private key is then physically secured - a physical vault of some sort helps here. The root CA is never, ever used on a network-connected machine. It is always used offline and it is only ever used to generate intermediate certificates. Only when specific conditions are met is the private key ever accessed. Usually those conditions entail a paper trail of accountability with multiple people who are physically present and are authorized to access the key for purposes declared in advance.

After the root CA generates the intermediate certificates and is secured, the intermediate certificates are then used to generate other certificates. The intermediates can possibly sit on a network connected machine, but that machine is behind a very well-guarded firewall.

So how does one create this setup, while implementing it easily and securely, AND relatively cheaply? First, you are going to need a few things:

Wall-powered USB hub with at least 4 open ports (you can't use built-in USB ports on a computer for this since they are usually underpowered - blame your motherboard manufacturer)
Six USB thumbdrives (ideally, brand new - they can have tiny storage and therefore be cheap - if you are doing this for personal use only, you can scale it back to three thumbdrives)
VirtualBox or VMWare
KeePass
An ISO of a public Linux distro of some sort - smaller is better (e.g. Tiny Core Linux)
1 CD-R + CD burner
Labels and a pen or marker
Multiple, physically secure locations for storing the various thumbdrives

If you are familiar with virtual machines and constructing a root CA the right way, perhaps you can see where this is going. Hopefully, you will at least agree with me that where I'm going with this is a step above what most people do for their root CA and it is FAR easier and cheaper to isolate/secure a USB thumbdrive than it is a whole computer system.

Now let's get this started:

Make sure everyone is in the room who needs to sign off on the construction of the root CA and they are well-watered/coffee'd and have been to the restroom recently. They aren't going anywhere for a while. If someone leaves during this time, there's a pretty good chance the entire process will have to be restarted depending on how serious you are about the root CA. It will help speed things up considerably if the individual doing the technical side has gone through a dry run on their computer in advance to get familiar with the process. Also decide in advance which of the four people present will receive thumbdrives containing important data required to decrypt the virtual machine and the root CA private key.

Start an audit log file on the USB thumbdrive where the virtual machine will reside. Don't forget to label this thumbdrive with something like "Virtual machine, audit log". In the audit log, document the date, time, and notes for each operation - including each and every executed command against the virtual machine. The start of the audit log should declare who is present for the CA signing process as the first note. This file should be updated accordingly any time the thumbdrive is accessed in the future (e.g. to generate a new intermediate cert every couple of years).

The next step is to make the Linux distribution ISO read only to all users. Depending on the host OS, this step will vary. A simple solution to making the ISO file read only is to burn the ISO to a CD as an ordinary file (i.e. don't burn a bootable CD). If you plan to burn a CD, you might as well burn a copy of the verification information mentioned in the next step. This portion of the process can be done in advance but should be noted in the audit log that the ISO is confirmed to be read only.

Next, verify the ISO against public verification information such as a hash or PGP. Then compare the same public verification information from the same source at a secondary location in such a way that defends against a man-in-the-middle attack. A different computer or device on a completely different network satisfies this (e.g. a smartphone visiting the same URL over a cell network). If that isn't possible and the same machine as the one that retrieved the ISO is used, a secure session to a temporary VPS (e.g. a Digital Ocean droplet), 'wget' the URL containing the information, and then using 'cat' on the retrieved information may be sufficient for all present.

At this point, randomly generate the two main passwords for the virtual machine and CA root key using KeePass and store the database on the thumbdrive(s). If using a single thumbdrive for personal use, use the master password option. Otherwise, use the encryption key option and store the password database on one thumbdrive and the encryption key for the database on another. When operating in a group, clone each password database thumbdrive to another thumbdrive (up to four drives attached to the hub at this point - Hint: Plug in and remove drives one at a time to make labeling and cloning easier). Label each password database thumbdrive appropriately. They will be given to responsible individuals at the end of the process to keep secure along with instructions on use.

Now build the virtual machine with the ISO on the first thumbdrive, disabling unnecessary features such as audio. Enable encryption on the virtual machine disk image file and use the appropriate password from KeePass. Leave networking enabled for the moment. Install the OS, making sure that files are persistent across reboots. Install OpenSSL and verify that it runs. Then power off the virtual machine.

Disable networking. Physically disconnect the host's Ethernet cable/WiFi/etc. Disable/remove the optical drive controller in the virtual machine software. Connect the thumbdrive that will hold the certificates and keys to the host. Boot the virtual machine back up. Attach the newly connected thumbdrive to the guest using the virtual machine software (if it hasn't done so already) and then mount the thumbdrive inside the guest OS:

mkdir /mnt/thumbdrive
blkid
mount -t [TYPE] /dev/[something] /mnt/thumbdrive

Now you are ready to construct your root CA. From the terminal, run the following as root:

mkdir root_ca
chmod 710 root_ca
cd root_ca
openssl req -new -newkey rsa:4096 -x509 -days 3650 -keyout ca.key.pem -out ca.cert.pem

You will be asked a series of questions. Use the appropriate password from KeePass to protect the CA private key. Common Name should be something like "[Organization] Root Certificate Authority". Email Address should be left blank. 3650 is roughly 10 years. That's the length of time until you have to go through the whole process again. 7300 days is ~20 years, 10950 is ~30 years. Shorter times are, of course, more secure but create more hassle.

Then, run:

chmod 400 *.pem
openssl x509 -noout -text -in ca.cert.pem
cat ca.cert.pem
cp ca.cert.pem /mnt/thumbdrive/

The first command makes it so only the root user can read the files - why OpenSSL doesn't automatically chmod 400 everything that it generates is a security vulnerability that should be fixed. The second command dumps out the information about the certificate (be sure to verify that "CA:TRUE" appears). The 'cat' command dumps the raw PEM certificate data to the screen (a sanity check for PEM formatted data). The last command copies the signed certificate to the thumbdrive. If you ever accidentally dump or copy the 'ca.key.pem' file, then you will have to start over.

Now we are ready to generate an intermediate certificate which will be used to sign all other certificates. Run:

sed s/CA:FALSE/CA:TRUE/ < /etc/ssl/openssl.cnf > openssl.cnf
openssl req -config openssl.cnf -new -newkey rsa:4096 -days 1095 -keyout intermediate_01.enckey.pem -out intermediate_01.req.pem
chmod 400 *.pem

The first line alters the OpenSSL configuration so that you can generate an intermediate certificate that can be used to sign other certificates. Depending on the OS, the openssl.cnf file might not be at /etc/ssl/. The second command again asks a series of questions is asked. Similar sorts of responses should be used. Common Name should be something like "[Organization] Intermediate Certificate Authority". Leave the challenge password and company name empty. 1095 is about 3 years. That's the length of time until the virtual machine will have to be fired up again to generate a new intermediate certificate.

openssl x509 -req -days 1095 -in intermediate_01.req.pem -CA ca.cert.pem -CAkey ca.key.pem -extfile openssl.cnf -extensions v3_ca -set_serial 01 -out intermediate_01.cert.pem
chmod 400 *.pem
openssl x509 -noout -text -in intermediate_01.cert.pem
cat intermediate_01.cert.pem
cp intermediate_01.cert.pem /mnt/thumbdrive/
cp intermediate_01.enckey.pem /mnt/thumbdrive/

Finally, power down the virtual machine. Disconnect all thumbdrives. The thumbdrive labeled with the encrypted virtual machine and is moved into a physically secure location (e.g. a vault). The thumbdrive with the CA public key and intermediate CA files move over to other, properly firewalled, network-connected infrastructure. Once moved over, if automation is desired and authorized, the password can be removed from the intermediate CA private key:

openssl rsa -in intermediate_01.enckey.pem -out intermediate_01.key.pem

If you need more than one intermediate certificate or are renewing the certificate, adjust the above commands to increment all the 01's to the next available number (i.e. 02, 03, 04, 05, etc).

Four of the people present each receive one of the four thumbdrives containing the password database/decryption key after they sign for them. How this happens is up to the organization. Here is some sample verbiage: "I, ________, hereby accept one of the required components of the [Organization Name] root CA password data store. I will keep this device and the data on it in a secure location at all times. I will not copy, duplicate, or clone the data for any reason. I will not use the device or the data on the device on any machine connected to a network. If any of these terms are violated, disciplinary action may be taken against me up to and including termination. If I ever leave [Organization Name], I agree to transfer stewardship of this password data store to another [Organization Name] employee. Date/Signature"

This concludes how to correctly protect your nuclear arsenal and create a root CA on the cheap with OpenSSL.

Tuesday, November 08, 2016

Scary but fun workaround for a Windows OS bug

I know I'm late for Halloween but we are still between holidays. Back in 2012, I needed a way to start Apache in the background. No one had built anything (that I liked), so I over-engineered a solution:

https://github.com/cubiclesoft/createprocess-windows

It takes the core CreateProcess() Windows API call, soups it up, and makes nearly every option directly available to command-line applications and batch files.

Today I put the finishing touches on a wild new addition. The newest feature adds TCP/IP socket support, thanks to this StackOverflow post.

What makes the new feature scary is that ANY IP address can be used, not just localhost addresses. Therefore, you can have it connect out to a server on the Internet or a LAN (or your IoT toaster?) and, in fine unencrypted fashion, route data to/from the stdin, stdout, and stderr of ANY process of your choice. Security? What's that?

The feature was added to work around bugs in various scripting languages where they end up blocking on one of the three standard process pipes since pipes can't be easily switched to non-blocking mode on Windows. Using non-blocking, non-overlapped sockets solves the problem. Just don't use the feature for anything other than localhost IPs and everything will be fine.

Saturday, October 29, 2016

E-Ink Readers

Ever since e-ink came out, I do an annual pilgrimage into the world of e-ink and e-readers and come away disappointed. This year is not really any different. Well, okay, it's actually worse because various people out there who run e-reader blogs are now saying e-ink is a dying technology. You know it's bad when it comes to that. That's kind of sad because the industry came SO incredibly and painfully close to minimizing or even eliminating power hungry backlit displays. Had smartphones been delayed just one year (i.e. people had said "meh" to iOS as they should have), we would likely have color e-ink tech today that rivals LCD. That's rather frustrating. But now I need to digress and cover a few core problems with e-ink that has ultimately relegated it to the background.

One of the core problems of e-ink is the refresh rate. A screen refresh in e-ink land is generally a giant flash to black to white to various bits of images popping in and out. It's a distracting seizure-inducing affair. Fujitsu actually somewhat solved the e-ink seizure-inducing screen flashing issue rather elegantly back in 2011 but apparently no one noticed:



There in that video, you can see an initial switch to white and, as the screen is redrawn, a black bar slides across the screen. The poor refresh rate of that e-ink display is somewhat hidden by the animation. Also, on a less important but still important note, Fujitsu had color e-ink in 2011 that looked pretty decent-ish. Sigh.

Another core problem with e-ink is that the physical size of what can be obtained for under $500 is rather small. The standard today is a 6-inch "phablet" (neither phone nor tablet). 6" e-ink displays do not work at all for reading technical documentation. Sure they are portable, but without a sufficient screen refresh rate and only a grayscale screen, they are rather impossible to use. As e-ink displays get larger, the cost also seems to go up exponentially to the point that buying a LCD laptop/tablet combo frequently makes more sense.

The final core problem with e-ink is that it first got sucked into devices that display books and then somehow never really showed up anywhere else. The book vendors that produced the majority of the devices used proprietary, closed platforms, which translated to no developers for apps for those platforms. A consumer buys the device and then was stuck with whatever the vendor decided was good for them, which, as most developers know, usually doesn't have a good end result. Devices that cost a lot more and do one thing but happen have the hardware to do many things are only slightly more horrible than kitchen gadgets that do one thing. Yes, I just compared your favorite e-reader to a toaster. Someone, somewhere at Amazon and similar book vendors just happened to make the inept decisions to lock their devices down and not put sensible OSes on them, thus limiting their usefulness.

At any rate, I went on my annual pilgrimage this year with knowledge of the results of my efforts of previous years. And came away with the same amount of disappointment as usual. In short, not much has changed and there are fewer devices on the market and the remaining devices only received incremental improvements. Before I get to this year's devices, here's a brief history of several e-ink related technologies and related devices that seemed awesome for a while but either never made it to production or did make it to production but were killed off for unknown reasons:

The Qualcomm Mirasol display was extremely interesting until Qualcomm basically killed it off. Mirasol was something vastly different from e-ink and the battery chugging active displays that we are familiar with and it even had a refresh rate of 15 fps and a slightly washed out color display, which was and is good enough for lightweight video playback. Supposedly a device with Mirasol had a battery life similar to e-ink. It also managed to make it into real-world production in the forms of the Kyobo e-reader, which was only released in South Korea and never made it to the U.S., and the Toq, which was a very silly decision for a smartwatch. IMO, Mirasol failed to reach a wide audience due to bad marketing decisions on Qualcomm's side of things. Google is big enough to still attempt something serious here with Mirasol - even if they just produce developer units running stock Android.

Amazon owns a small company called Liquavista. It is a similar display tech to e-ink but the demos of the tech show near real-time refresh rates and it supposedly had actual color reproduction! Liquavista calls it an Electrowetting Display (EWD), which uses colored oils instead of the usual ink bits whatever that actually means. Their website is weird but they were acquired and became an Amazon company somewhere along the line (2013?). My guess is that either Amazon intends to leverage the technology OR bury it. Given that it hasn't shown up in anything yet after four years, Liquavista might have also oversold what it can do (i.e. lied) and therefore it will never see the light of day. In either case, this is lame. When potentially awesome tech dies, there needs to be information about why it didn't work so that maybe someone else can see a solution and pick up where they left off and then succeed.

E Ink - the actual company that started this part of the tech industry - introduced the "color" e-ink display at some point, but instead of using CMYK balls as everyone was hoping for, they opted for a color filter. The end result of that effort was a very dark/very dim display that could no longer produce white but rather only achieving a gray color of sorts and essentially killed off the idea of color e-ink for everyone. It was also too little, too late. Fujitsu's display, on the other hand, seemed like a legitimate implementation of color e-ink. Again, I never really saw anything come from Fujitsu but that one demo at some conference, which ultimately amounts to vaporware. On the other hand, the color e-ink display that E Ink introduced only made it into one commercial product - two extremely similar and expensive educational tablets - the jetBook Color and jetBook Color 2 - with colors so muted, it is hard to tell whether or not they have any color at all. This past year, E Ink showed off an updated version of their color e-ink product that looks better than the original versions but unless Amazon takes a liking to it for the Kindle, we can consider that mostly to completely dead.

As a software developer, I really only want devices I can write software for. That, of course, means that the device needs to be running an OS that I can push binaries onto or write code on directly. It also needs to be an OS that has a reasonably decent sized community around it. Since I don't generally want to try typing on an e-ink display, the options are basically limited to the most popular mobile device OSes because you push binaries from a desktop OS to them. That, of course, means something running Android with the Google Play store and has Android Studio capability. That requirement immediately eliminates about 99% of the e-ink devices that were ever released, including the entire Amazon Kindle e-ink series, which apparently runs some extremely touchy/picky non-Android OS that falls apart rather quickly if you root it. And, before anyone complains that I'm wanting a tablet, if I wanted a full blown tablet, I already have one with stock Android on it. My primary purpose for an e-ink (or e-ink like) Android device is quite different.

And now we reach the results of my search for this year. I ran into two devices this year that are sort of interesting but ultimately useless: The Energy Sistem Ereader Pro HD and the 13.3" Good e-Reader. The former is something that is reasonably affordable but is only a 6" screen. It is running Android, but not stock and Android 4.2 (Jelly Bean) is kind of old at this point, but it does have Google Play. The Good e-Reader has a 13.3" screen but, based on several of the videos I watched, also has an extremely serious screen flicker issue reminiscent of early e-ink, is running an even older version of Android, is approximately 5 times as expensive as the Energy Sistem Ereader Pro HD, and isn't shipping yet because it's one of those crowdfunded operations. The various Onyx Boox devices also hit my radar for a bit but even the latest is inferior to the Energy Sistem Ereader Pro HD. And, of course, none of these devices has a color display. Therefore, the results of my search once again are rather underwhelming.

It is still my opinion that Mirasol has the biggest potential for something to happen. Qualcomm just needs to get their act together. The underlying tech behind Mirasol is so vastly different from LCD and e-ink that it has the potential to dramatically transform mobile computing. The first company that produces a device with a 7" to 10" tablet form factor with a Mirasol + capacitive touch display running a stock build of the latest version of Android for around $200 to $300 gets my money. For the first iteration, it can also weigh up to one pound if that helps.

Saturday, August 13, 2016

Shipping a gift? Nifty ideas for online delivery

Everyone loves receiving gifts. It's the thought that counts! Or something like that. However, most families these days tend to spread out across the country and so we end up shipping gifts to each other. But they are delivered in a boring brown box with a generic label on them where the return address is Amazon or another online store. And they might have ordered something else from Amazon or that online store too. And therefore they might open the gift before they are supposed to.

It turns out that there is a simple solution that is pretty cool. You can't get away from the boring brown box but you CAN do something about the shipping label on the boring brown box. Address labels typically look like:

First & Last Name
Address line 1
Optional address line 2
City, State ZIP, Country

I've bolded the parts that same-country shipping companies actually pay attention to. Let's hack the address label to make it do something useful for us that won't annoy shipping companies (too much). The first line is the most hackable/flexible and here is the first thing you can try:

GIFT FOR {Recipient's first and last name} FROM {Your first name}

That's for delivery to the non-tech-savvy grandma. However, we're just getting started. For tech-savvy family members, you can do something much cooler. Since there are approximately 35 characters in the available space, we can also do this for the first line:

VISIT TINYURL {URL shortener code}

Which can then direct the recipient anywhere: A YouTube or Vine video, a funny image, or a custom website. Once you've convinced the recipient to open their web browser or a mobile app, your imagination is the limit! This also opens up quite a few programming opportunities. For example, the target host could have a custom API that emits a JSON object containing a bunch of extra information, which would allow it to be paired with an app that can provide a more media-rich experience beyond what a web browser could provide. And that's just getting started. A package and the gift inside is really both a conversation starter and something memorable just waiting to happen.

The URL shortener hack works today, as-is, with quite a few online stores including the major ones like Amazon, eBay, ThinkGeek, etc. If an online store has split the first and last name field into two separate fields, use "VISIT" for the first name and the rest of the string for the last name. The shipping company won't really care. Ideally, someone will come up with a way to have this work WITH the shipping companies so that if they really want to know the real name of the recipient (e.g. for insurance purposes), they can easily get at that information.

Note that a lot of address labels only have uppercase letters. Therefore, mixed-case and non-alphanumeric shortened URL codes probably won't work with this package hack. The only real downside to using a URL shortener is that anyone who handles the package can access the shortened URL and therefore visit the target URL. If you care about privacy (and you should), there are password protected URL shorteners out there (e.g. thinfi). Send the password to the recipient via e-mail (or to an app!) and the URL shortener code with their package. That way no one visits the target URL until the package arrives at its destination.

Wednesday, June 22, 2016

Elegant iptables rules for your Linux web server

You've just upgraded to your first VPS or dedicated server and you think you've got all the software bits in place (LAMP stack and all that) OR you've moved from a hosting provider with easily configured hardware firewalls. Now you've read something somewhere that says you need 'iptables' rules with your new host. If you have any friends who manage Linux systems, you've also heard that "iptables is hard." In my experience, the only thing hard about iptables is that no one seems to publish decent rulesets and users are left to figure it all out on their own. It doesn't have to be that way!

Ubuntu/Debian: apt-get install iptables-persistent

RedHat/CentOS: chkconfig iptables on

That installs the iptables persistent package/enables storing iptables so that they load during the next boot.

Ubuntu/Debian:
/etc/iptables/rules.v4
/etc/iptables/rules.v6
RedHat/CentOS:
/etc/sysconfig/iptables
/etc/sysconfig/ip6tables
Those editable configuration files are where the IPv4 and IPv6 iptables rules are stored respectively and are loaded from on boot with the previous bit. Here is a good set of starter IPv4 rules:
*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp --syn --dport 80 -j ACCEPT
-A INPUT -p tcp --syn --dport 443 -j ACCEPT
-A INPUT -p tcp --dport 22 -j ACCEPT
-A INPUT -p icmp --fragment -j DROP
-A INPUT -p icmp --icmp-type 3 -j ACCEPT
-A INPUT -p icmp --icmp-type 4 -j ACCEPT
-A INPUT -p icmp --icmp-type 8 -j ACCEPT
-A INPUT -p icmp --icmp-type 11 -j ACCEPT
COMMIT
And a good set of starter IPv6 rules:
*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp --syn --dport 80 -j ACCEPT
-A INPUT -p tcp --syn --dport 443 -j ACCEPT
-A INPUT -p tcp --dport 22 -j ACCEPT
-A INPUT -p icmpv6 -j ACCEPT
COMMIT
You should run:
ifconfig -a
To figure out what interface is the local loopback interface. The rules above default to the 'lo' interface, which is probably correct unless you've got a weird host.

After that, you should change the rules to reflect the ports that you need open. To determine what ports are currently open, you can run:
netstat -plntu | grep -v 127.0.0.1: | grep -v ::1: | grep -v dhclient | grep -v ntpd
That set of commands returns all running TCP/UDP servers that are not exclusively localhost and aren't the standard DHCP client or the NTP daemon (by the way, you should have ntpd installed to avoid severe clock drift). That is, it will show all the ports that probably need to be firewalled properly. Use Google to search for any port numbers you don't recognize. (Hint: Port 22 is SSH/SFTP - it's included above and you probably want to leave that rule alone!) For each port you decide to allow, adjust the rules accordingly - probably by adding new lines that mostly mirror other lines except the --dport option will be different.

After the TCP rules, you should put any UDP rules you need. Since UDP is generally rarer to see except if you are hosting a multimedia or game server, I didn't include any above, but they look like this:
-A INPUT -p udp --dport 2933 -j ACCEPT
Just replace the 'tcp' bit with 'udp' and drop the --syn option. Keep in mind that a lot of mobile technology (e.g. smartphones) don't support UDP over wireless networks. To accommodate mobile devices, it is a good idea to enable TCP mode alongside any UDP servers and set up firewall rules for both.

Once you are ready to fire up the new rules, run commands similar to these:

Ubuntu/Debian:
iptables-restore < /etc/iptables/rules.v4
ip6tables-restore < /etc/iptables/rules.v6
RedHat/CentOS:
iptables-restore < /etc/sysconfig/iptables
ip6tables-restore < /etc/sysconfig/ip6tables
That's it! You are now a master of iptables rules. And it was just as easy to set up, if not easier than, Ubuntu ufw or other system-specific solutions!

Let's say you get it in your head that you want to restrict access to a single IP address or an IP address range. IMO, if you can, leave your clean and elegant rules as-is and use either the Web Knocker Firewall Service or fail2ban instead of messing around with iptables. For static IP addresses that will never, ever change (really?) you can use the --src option (e.g. -A INPUT -p tcp --dport 22 --src 1.2.3.4 -j ACCEPT) but don't do that unless you really know what you are doing.

One other thing to consider doing is to make changes to your kernel and network stack. The file to edit is /etc/sysctl.conf and here are some relevant options (read the Internets before making such changes):
kernel.panic=600
net.ipv4.conf.default.rp_filter=1
net.ipv4.conf.all.rp_filter=1
net.ipv4.tcp_syncookies=1
net.ipv4.icmp_echo_ignore_broadcasts=1
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv6.conf.all.accept_source_route = 0
The rest of this post is a quick overview of how the iptables rules work. The default policies of iptables is ACCEPT with no rules, which means all packets are accepted. So the first thing that happens in the rules is to switch both INPUT (packets coming in) and FORWARD (only relevant for redirecting packets - e.g. a router) policies to DROP (i.e. ignore all packets). The OUTPUT (packets headed out) policy is left as ACCEPT (i.e. accept all outbound packets). In my experience, there's never a valid reason to switch OUTPUT to DROP unless you actually want to create headaches for yourself. Keep it simple. Keep it elegant.

Now that we're past the policies, let's look at the rules themselves. The first rule says to have the 'state' kernel module for iptables check to see if an incoming packet is part of a RELATED or ESTABLISHED connection. If so, ACCEPT it and skip the rest of the rules. This is a great rule to have first because nearly all packets will hit this rule and immediately pass through the firewall. It's performance-friendly! It also shows that the ordering of the rules can be quite important for maximizing system performance.

The next rule lets all new connections to all ports on the 'lo' interface (localhost) through. Again, another nice, performance-friendly rule. After that, new connections to TCP ports 80, 443, and 22 are let through. The --syn option checks TCP flags for a valid SYN packet. Since most port 22 connections are extremely long-lived and, depending on the client, --syn might cause frequent disconnects, it is excluded from the rules.

After the TCP and optional UDP rules are the rules for ICMP packets. For IPv4, I drop fragmented ICMP packets since those types of packets are only ever used in a Denial of Service attack. ICMP types 3 and 4 are essential/required, type 8 is for ping (optional), and type 11 is for traceroute (also optional). IPv6 utilizes ICMP heavily, so blocking ICMPv6 traffic is currently considered bad practice. I've also not seen any particular firewall rulesets worth using for more strict ICMPv6 that don't look overly complicated. So I'm simply accepting all ICMPv6 traffic until someone points out issues with doing so (e.g. a confirmed CVE).

The last line simply COMMITs all of the the changes and enables the new rules. If a rule fails to apply for some reason, iptables-restore will roll back all of the changes since the last COMMIT line. This is a really nice feature because you don't want to get through half of the rules, encounter an error, and be locked out of the system.

By the way, Linux nerds, did you see how easy this was? This is totally what you should be doing. Useful things first such as complete iptables rulesets that people can copy and paste. Then slightly less useful, more technical things after that such as describing how those rules work for the people who really want to know.