Saturday, February 18, 2006

Eliminating web advertisements

I've discovered the perfect solution to removing advertisements from 99% of all websites. Well, I've known about the solution for a long time but only last night found the last piece to the puzzle.

A little known aspect of Windows computers is the 'hosts' file. Linux people use this file all the time for various purposes just because Linux is all about editing configuration files. Gag. Anyway, I've known for some time that I can eliminate advertisements on the web by poisoning my DNS cache via a modified 'hosts' file.

At this point you are wondering about various things like, "What is DNS?" "What is a DNS cache?" "What is DNS cache poisoning?" I'm so glad that you are asking smart questions like those instead of clicking the 'back' button. DNS is short for Domain Name Server. Basically, when you type in www.yahoo.com, a query is made to your ISP's DNS server to obtain an IP address for www.yahoo.com. Usually the ISP doesn't know what address a DNS server is, so it has to look it up. Usually an ISP has an ISP above it and one above that and so on. Obviously there has to be some sort of root server. There are actually 13 root servers:

http://en.wikipedia.org/wiki/Root_nameserver

If you want to play around with lookups to learn how they operate, go to a command prompt and type in 'nslookup'. That connects you to your ISPs DNS server. You can tell nslookup to switch to the root servers by typing in 'root'. The root servers only contain information on how to find the DNS server responsible for a domain. So a root server query is only the starting point to obtaining an IP address.

IP addresses are how the Internet talks to each other. Basically a web browser and e-mail client and FTP client and so on are all sugar-coating applications that wrap up protocols that use IP addresses to identify the servers that are sitting on the Internet.

DNS caches are how the Internet keeps from getting bogged down with requests for name to IP mappings. Basically, each host in the DNS chain keeps track of the name to IP addresses it has already resolved. Of course, this introduces the problem of making sure everyone gets updates. Sometimes hosts change IP addresses, which means the old name to IP addresss mappings are no longer valid. Each DNS entry has a Time To Live (TTL) associated with it.

DNS cache poisoning is where a name is intentionally mapped to the wrong IP address. This is usually done by attackers. Modern-day attackers have used this approach to poison DNS caches to execute phishing scams. However, intentional poisoning can be useful as I will address in a moment.

The 'hosts' file is one of the first places Windows looks when it tries to resolve a DNS entry to an IP address. The 'hosts' file is a name to IP address mapping and usually only contains a single entry:

127.0.0.1 localhost

Pinging 'localhost' via the command-line 'ping' command displays:

ping localhost
Pinging MyHost [127.0.0.1] with 32 bytes of data:
Reply from 127.0.0.1: bytes=32 time<1ms TTL=128
Reply from 127.0.0.1: bytes=32 time<1ms TTL=128
Reply from 127.0.0.1: bytes=32 time<1ms TTL=128
Reply from 127.0.0.1: bytes=32 time<1ms TTL=128

Ping statistics for 127.0.0.1:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms


Windows looked up 'localhost' in the 'hosts' file and mapped it to 127.0.0.1. With this knowledge, it is possible to map any host name to 127.0.0.1 by editing the 'hosts' file.

Luckily, several other people already do a good job of this:

http://en.wikipedia.org/wiki/Hosts_file
(See the external links at the end - I prefer Mike's Ad Blocking host file over the others.)

However, what has held me back up to this point is that simply having a 'hosts' file is not good enough. This is because ads show up either as broken images or take forever to load. Or, for those of us who do local web development, ugly "HTTP 404" error messages from our running web server. I'm more concerned about that latter part - someone could carefully craft a website designed to exploit a localhost webserver.

The solution to this problem is to use a tool called eDexter (http://www.pyrenean.com/) in combination with a modified host file. The modified host file poisons the DNS cache and then eDexter handles the actual web request and returns an image (or bogus .js or empty .swf file). The images, unfortunately, are not completely transparent (a couple pink pixels in the middle). So, changing them to completely transparent GIF images is a good idea. eDexter is a special web server designed to handle poisoned DNS caches that point to localhost by returning an image. eDexter is, as of this writing, the best tool for the job (the other free tool is nohttpd but lacks the feature set of eDexter).

For those who run a web server locally, the solution is to move the local web server to an alternate port (e.g. port 81) or make the server start manually instead of every time the computer boots. eDexter, unfortunately, has to reside on port 80 and has no way to configure it to operate on another IP address (e.g. 127.0.0.2) The IP address range 127.0.0.* is reserved for the local computer.

This combination of a hosts file with eDexter is quite impressive and powerful. Combine that with the Google toolbar (toolbar.google.com) and every advertisement is gone. Instantly. A very nice solution to a major annoyance to software developers. Nothing is more annoying than experiencing popups and intrusive ads when hunting for a solution to a coding problem.