Tuesday, June 19, 2007

Why Windows Error Reporting (WER) does not work

If you are reading this, you are probably coming from the CubicleSoft website to learn more about Windows Error Reporting. This blog entry will thus be a little more professional as a result.

Windows Error Reporting, or WER for short, is a set of technologies Microsoft put together for Windows XP and expanded upon for Windows Vista:

http://en.wikipedia.org/wiki/Windows_Error_Reporting

To summarize the Wikipedia article, WER gathers error reports in a central location (Microsoft servers) and developers of software then can log into the system and retrieve those error reports and thus fix bugs. End-users of Windows see something like this when the application crashes:




The user clicks the "Send Error Report" and the bugs get fixed.

The official website of Windows Error Reporting (WER):

https://winqual.microsoft.com/


All that sounds good in writing until developers start reading the "fine print" on what is required. By default, applications are not WER enabled. That means the WER data sent by the user is completely ignored by the WER system (i.e. information about the crash simply gets thrown out). One would think every software author would be on board with this WER thing.

So, what exactly does it take to integrate WER? First and foremost, to just gain access to the system requires a Verisign Class 3 Digital ID. That costs $400 [US] per year. That staggering pricetag alone causes most developers to be unable to join the WER program in the first place.

Secondly, once a developer gains access to the system, they discover they have to modify their source code. Significantly. Windows Error Reporting requires integration with various Windows APIs and many of them are quite difficult to use correctly. When used incorrectly (easy enough to do so), wrong information about the crash can be sent, or in the worst case, cause a second crash to occur.

Lastly, the developer has to create two installers: One being the usual release. The other containing the release plus the release's PDB files. A PDB file is used by a debugger to tell where in the original source code the program is at when debugging the program. Without the proper PDB files, the developer can't use any crash reports.

Those are just the steps to integrate WER. It is a huge undertaking and quite expensive. But that's just the start. Just integrating WER isn't enough. The developer has to "baby-sit" WER. Here's why: The WER system only tallies crashes until it is told to start collecting data for a specific crash. Only once WER has been to start collecting data does it do so. Until it is told to collect data, when the user hits the "Send Error Report" button, the error data is simply ignored.

Once the developer finally retrieves data from WER, the data is just a stack/variable dump and whatever information they put into their software to pass onto WER. There is a very good chance that the data will be completely useless. Could be anything from a bad PDB/EXE matchup, to a bad stack dump, to threading issues, to not having the necessary symbols for some system binary not available to the developer, to not knowing what the user was doing at the time, etc. Probably something along the lines of a 75% failure rate.

Despite these huge hurdles, some non-Microsoft companies DO use WER. Off the top of my head, Valve Corporation uses it in their Steam client. Their games still crash and, well, the bugs haven't been fixed yet. Microsoft uses WER for their own products and actually fix bugs, but that's about the only exception.

The end result is that no one sends error reports to the Microsoft WER server mostly because people have figured out that doing so is a waste of time. Now you know why it is a waste of time.

Hopefully I didn't go too far over your head with this blog entry. I did get kind of technical but if you came here from CubicleSoft, you wanted to learn more about WER. The Crash Reporting Support Tool that is used by CubicleSoft bypasses WER and sends error reports directly to my inbox. It is an elegant solution that does not have the problems WER has and it actually works!

http://www.cubiclesoft.com/Support/

4 comments:

  1. Good info.
    I'm a developer and I'm interested in how you implemented your solution - the one that dumps the info you need. What sort of information are you sending? Where do you intercept the MS exception handling so that yours is executed instead?

    ReplyDelete
  2. While it's true that crash dump analysis is only really useful if you have symbol files even for your release builds, that doesn't mean you have to create two installers.

    The usual approach (which we also use at my company) is to set up a so-called "symbol server" internally. Whenever you build a new release version, you push the resulting .PDB files to that symbol server. And when you need to debug with those symbols (such as when you just downloaded a crashdump file from the Winqual portal), you just point Visual Studio to the symbol server.

    Also, only if your application already tries to catch and handle crashes, i.e. if you have a top-level "catch-all" exception filter, then you need to modify your application (if you want to use the WER mechanism, that is) by using either the ReportFault() API or the new Windows Error Reporting APIs (introduced with Vista).

    It is definitely true, though, that using the new WER APIs correctly in such a scenario isn't exactly trivial. More on this at
    http://www.clausbrod.de/Blog/DefinePrivatePublic200706.

    Cheers,

    Claus

    ReplyDelete
  3. James: The information that gets sent is a minidump of the active stack and relevant variables, a XML file containing loaded modules and stack information, a text file containing CPU/OS/RAM/hard drive information, and the user's input as to what they were doing at the time of the crash. And I can add more stuff to that list if I need it. All of that is compressed (ZIP file) and sent to me directly. This solution is somewhat hacky, so I'm not really willing to discuss it publicly (but it does work quite well).

    claus_brod: It is wonderful that you have thousands of dollars to burn and can use the Winqual WER service. The rest of us don't have deep pockets like that. So I'll stick with my solution which works under Vista without changes, sends error reports directly to me, has tons more information, and saves me $450 per year.

    Also, I prefer the method of two installers instead of a symbol server. My PDB files tend to be quite large.

    ReplyDelete
  4. Thomas,

    maybe this didn't come across right: I'm not trying to convince you to use WER/Winqual. We didn't like the extra costs for that ID, either, and I share the concerns over some of the other points you mentioned.

    Just wanted to point out that if you don't have a top-level exception handler anyway, and if you archive your release build PDBs, then at least two issues you mentioned disappear.

    You may want to discuss your concerns in the
    Microsoft forum on Windows Error Reporting
    - a couple of Microsoft engineers are listening, and I've found them quite responsive, too.

    BTW, the PDB files which we archive on our symbol server are up to 100 MB in size; this worked just fine for us.

    Cheers,

    Claus

    http://www.clausbrod.de/Blog

    ReplyDelete