Yesterday I spoke of a most heinous act of computer modification. Today Microsoft spent a huge chunk of the day attempting to solve a major problem.
The WGA (Windows Genuine Advantage - a.k.a. "Disadvantage") servers went completely down. Now I'm not a huge believer in coincidences but if Microsoft has ever had its pants down, this is perhaps a double helping (free wedgie!). Let's see here:
1) Push a secret worldwide update to Automatic Updates out to every computer on the planet.
2) My computer receives the update and VerifyMyPC flags it.
3) WGA servers receive the update completely unaware of what is happening.
4) WGA servers barf (perhaps something in the update they didn't like). All of the WGA servers go down.
5) User PCs attempting to connect to WGA servers can't and therefore are flagged as pirating Windows.
6) Microsoft catches wind of the problem and employees responsible for WGA head into work to solve the problem...and spend most of the day scratching their heads.
A PR disaster in the making if I've ever seen one. Had they had VerifyMyPC deployed throughout their organization, they could have avoided it or at least dealt with it a lot sooner (such as figure it out in 5 minutes instead of wasting hours in the office on a Saturday...time better spent at home).
Edit: Step 4 in the "sequence of events" is kind of vague. There is ALWAYS a reasonable explanation for what happens in a computer - it is just circuits and electricity after all - 0's and 1's. At the time I couldn't think of anything that would trigger a shutdown of the server. I was thinking more along the lines of "some application crashing or BSOD'ing" instead of, well, more reasonable ideas. After I thought about it a bit, perhaps step 3 made an incorrect assumption.
Suppose, for instance, you are in charge of the WGA servers and you are thinking about what hackers will consider a terrific target. The main Microsoft website is high profile but also extremely risky but breaking into WGA would be a great way to mess with a whole bunch of people at once and is a much more "backwater" system. So, as the server manager, you look for what is known as an Intrusion Detection System (IDS) and install one on the server. Then you set up a rule that says, "Should a file change, shut off this computer." Then you set up Automatic Updates to manual download and install (i.e. Ask me before doing either one). Then, you put into place a policy that when there is an update available via Automatic Updates one of the engineers (or a script) is to turn off/disable the IDS rule, run the updates, and turn the rule back on. This policy is then applied to all of the WGA servers to make the whole thing easier to administrate simultaneously.
Now Microsoft is huge and the left hand doesn't always know what the right hand is doing. So the Microsoft group responsible for Windows Updates releases a secret Windows Update that bypasses even the manual settings in Automatic Updates.
Every computer, including the WGA servers all download this update and install it. However! The IDS picks up on the fact that critical Windows files have changed. Each system then executes a perfectly flawless shutdown as per the rule set in the IDS system. Wam! Bam! WGA is completely down. The reason the engineers spent half a day in the office was probably to figure out what triggered the IDS rule to fire in the first place - even then they possibly didn't figure it out (depending on how good the IDS is - VerifyMyPC caught it on my system). Still, that's over half of their Saturday wasted.
I can see a number of people getting yelled at over this:
1) The Windows Update group responsible for the whole mess. First for issuing a secret worldwide update. And then for getting caught.
2) The WGA server group for having a single point of failure that can cause the servers to all go down.
3) Those who programmed the client-side of WGA for assuming that if the WGA servers are unreachable, because the servers are all down, that the person is pirating Windows.
4) The support group (in India?) who said that the WGA servers would be back up sometime on Tuesday.