Saturday, August 25, 2007

A Most Coincidental Event!

Digg this

Yesterday I spoke of a most heinous act of computer modification. Today Microsoft spent a huge chunk of the day attempting to solve a major problem.

http://www.boingboing.net/2007/08/25/microsoft_wga_server.html

The WGA (Windows Genuine Advantage - a.k.a. "Disadvantage") servers went completely down. Now I'm not a huge believer in coincidences but if Microsoft has ever had its pants down, this is perhaps a double helping (free wedgie!). Let's see here:

1) Push a secret worldwide update to Automatic Updates out to every computer on the planet.
2) My computer receives the update and VerifyMyPC flags it.
3) WGA servers receive the update completely unaware of what is happening.
4) WGA servers barf (perhaps something in the update they didn't like). All of the WGA servers go down.
5) User PCs attempting to connect to WGA servers can't and therefore are flagged as pirating Windows.
6) Microsoft catches wind of the problem and employees responsible for WGA head into work to solve the problem...and spend most of the day scratching their heads.

A PR disaster in the making if I've ever seen one. Had they had VerifyMyPC deployed throughout their organization, they could have avoided it or at least dealt with it a lot sooner (such as figure it out in 5 minutes instead of wasting hours in the office on a Saturday...time better spent at home).

Edit: Step 4 in the "sequence of events" is kind of vague. There is ALWAYS a reasonable explanation for what happens in a computer - it is just circuits and electricity after all - 0's and 1's. At the time I couldn't think of anything that would trigger a shutdown of the server. I was thinking more along the lines of "some application crashing or BSOD'ing" instead of, well, more reasonable ideas. After I thought about it a bit, perhaps step 3 made an incorrect assumption.

Suppose, for instance, you are in charge of the WGA servers and you are thinking about what hackers will consider a terrific target. The main Microsoft website is high profile but also extremely risky but breaking into WGA would be a great way to mess with a whole bunch of people at once and is a much more "backwater" system. So, as the server manager, you look for what is known as an Intrusion Detection System (IDS) and install one on the server. Then you set up a rule that says, "Should a file change, shut off this computer." Then you set up Automatic Updates to manual download and install (i.e. Ask me before doing either one). Then, you put into place a policy that when there is an update available via Automatic Updates one of the engineers (or a script) is to turn off/disable the IDS rule, run the updates, and turn the rule back on. This policy is then applied to all of the WGA servers to make the whole thing easier to administrate simultaneously.

Now Microsoft is huge and the left hand doesn't always know what the right hand is doing. So the Microsoft group responsible for Windows Updates releases a secret Windows Update that bypasses even the manual settings in Automatic Updates.

Every computer, including the WGA servers all download this update and install it. However! The IDS picks up on the fact that critical Windows files have changed. Each system then executes a perfectly flawless shutdown as per the rule set in the IDS system. Wam! Bam! WGA is completely down. The reason the engineers spent half a day in the office was probably to figure out what triggered the IDS rule to fire in the first place - even then they possibly didn't figure it out (depending on how good the IDS is - VerifyMyPC caught it on my system). Still, that's over half of their Saturday wasted.

I can see a number of people getting yelled at over this:

1) The Windows Update group responsible for the whole mess. First for issuing a secret worldwide update. And then for getting caught.
2) The WGA server group for having a single point of failure that can cause the servers to all go down.
3) Those who programmed the client-side of WGA for assuming that if the WGA servers are unreachable, because the servers are all down, that the person is pirating Windows.
4) The support group (in India?) who said that the WGA servers would be back up sometime on Tuesday.

5 comments:

  1. I manage lots of PC's that have Automatic Updates enabled, and I see that the two DLL's have not been updated since April (4/16/2007) and they all have the same version of the DLL's (7.0.6000.374). What are the version on your files that have changed?

    ReplyDelete
  2. damien: You are right. I looked at the message incorrectly. wups.dll is 'added'. Looking at the detailed logs, the file was added to 'C:\WINDOWS\LastGood\system32\'. However, a number of files WERE indeed changed. VerifyMyPC doesn't go by timestamps, it goes by hashes of the files. I've updated the first article with a list of files that were changed.

    ReplyDelete
  3. asdfasdfasdfasd3ddd wrote..."A secret update is bad, but you really don't explain why it went down just "it barfs". Gee, that tells a lot! And why is someone does taking out MSFT's WGA servers seems weird.

    This whole thing seems like a big plug for verifymyPc, which apparently exists? I haven't heard of a single person using this.

    The fact only one other person has left a comment leads me to believe this story was a waste. Especially since only one other person cared enough to comment."

    (Blogger is starting to REALLY annoy me...you can't edit comments for content - either allow or deny the whole thing)

    asdf, let me address your three concerns:

    1) I _DID_ explain the "it barfs" portion later on in the blog entry. I describe in excruciating detail of how and why the use of an IDS system could trigger the effect the world saw.

    2) The reason you've never heard of VerifyMyPC is because I suck at marketing and rely on word-of-mouth. Those who use VerifyMyPC swear by it up and down as being one of the best tools they have ever used. Just today I had someone call it "The 'poor man's Tripwire".

    3) If you look, most of the comments are being placed on the other blog entry that started this whole thing (the secret Windows Update). That said, I've had more people on this blog in the past 24 hours than I've had in the past 3 years - yet I blog anyway. I refuse to let a lack of visitors affect my blogging.

    ReplyDelete
  4. If MS is pushing out updates without permission*, maybe it is also doing some verifying on the genuine advantage servers without telling us. So instead of just the verication that we know of - when we download updates, patches, and certain other software - our windows boxes are constantly calling home to verify their own authenticity?? *I have seen strange activity with those two files before on my pc(s) and I turn updates OFF!

    ReplyDelete
  5. good writings you have here :)

    ReplyDelete