Today we do a deep dive into a portion of the Portable Executable (PE) file format and discover some rather broken behavior in the Microsoft Windows OS that causes some processes to not start up because the Windows loader gets confused about what a RVA is.
The PE file format is how all executables (i.e. EXEs) and things containing executable code (e.g. DLLs) are loaded into memory and run. It's a highly structured file format and has many features not found in other executable formats for other OSes and has many nuances and details surrounding it, some of which are intentionally obscured. One of the more complicated concepts for someone coming into the PE file format to understand is the Relative Virtual Address, or RVA for short. Microsoft defines the RVA as:
"The address of an item after it is loaded into memory, with the base address of the image file subtracted from it in an image file."
Another way of saying it is that a RVA is the offset from the base address where the file is loaded into memory. The Windows executable loader is responsible for reading a file off disk, mapping PE sections into RAM, performing relocations if necessary, and beginning code execution. This behavior will be important later.
When non-Windows loader programs interact with PE executables, the RVA is NOT the preferred mechanism to work with but rather physical file offsets. The PE file format makes translating RVAs to physical file offsets a bit complicated. Read an RVA, walk the PE sections list looking for a match to a section, get the position in the physical file, and be aware that sections may not accurately report their real sizes for efficiency - that is, a PE file generally makes assumptions about being run as a real executable in RAM.
When the Windows loader runs a program, it reads in the section list and uses the associated flags for that section to construct the appropriate VirtualAllocEx() calls. After each allocation, it loads the section data off disk into RAM. Once the entire image is loaded into RAM, it walks the PE data directories looking for various additional things to do. If the base address was adjusted, the Base Relocation Table is processed. If it is a driver or critical executable, a digital certificate is validated. One of the most well-known and popular PE sections is the Import Table, which defines additional "modules" (really just DLLs) that are required for the executable to function. So the Windows loader recursively walks down the import list until everything is properly mapped into the process space. Once everything is loaded and validated, the executable can begin running.
Or does it? Right after the Import Table comes the Resource Table. The PE file format stores all kinds of things, including per-language resources used by the application such as icons, cursors, menus, dialogs, and more. The Windows loader itself also looks for one resource in the Resource Table in particular: The application Manifest. One day, while Microsoft NT kernel module employees were huffing lines of Kool-Aid powder, they came up with the abomination known as Side-by-Side (WinSxS) and XML Manifests: "You know what - snnckk - would be a great idea? Let's throw a XML parser into the Windows loader and see what happens. Snnnckkkkk. [cough, wheeze, eyes-crossed]" "Great idea maaaaaan. This is some siiiick juice duuuuude. I'm baked out of my mind now but let's totally do that when we get back to our desks. At the moment I'm thinking about going over and licking that wall." Sigh. There's no rational reason for a full-blown XML parser to ever be included in any application loader. That's just begging for a limitless myriad of security vulnerabilities/exploits. Anywho, I digress.
The Resource Table is a structured table that defines a tree of nodes. In theory, it could have been a tree of unlimited depth but it never went past three levels deep: Type (icon, dialog, manifest), ID or name, and language. The tree itself uses offsets to refer to later levels in the tree. However, the leaf nodes use RVAs. Just so you don't have to go look up the PE/COFF specification, here's what a leaf node looks like according to the specification:
Data RVA (4 bytes)
The address of a unit of resource data in the Resource Data area.
Size (4 bytes)
The size, in bytes, of the resource data that is pointed to by the Data RVA field.
Codepage (4 bytes)
The code page that is used to decode code point values within the resource data. Typically, the code page would be the Unicode code page.
Reserved (4 bytes)
Must be 0.
For the purposes of the remainder of this discussion, you will probably need an executable to look at and at least one tool to look at the file's structure.
32_pe_data_dir_resources_external_rva_before.dat
PEview - For viewing the above file
If you look at the above PE file, it is notable that the PE resource data directory comes AFTER all of the individual items in the resource directory. This executable was a "mistake" because I assumed Windows would treat Data RVAs as RVAs and it was easier in code to generate a modified PE resource directory by writing leaf nodes before the main data directory as doing so required fewer precalculations. If you rename the file to have a .exe extension, you'll see that Windows Explorer loads and displays the application icon from the resource directory table of the executable. This means that Explorer treats RVAs as RVAs - at least as far as icons go. When attempting to run the executable, however, the Windows loader won't even start the application.
So what's going on here? It was a real head-scratcher until I ran into this StackOverflow post, which contains an answer with this structure:
The conclusion is that some parts of Windows accept and correctly handle the slightly strange resource directory of this executable (e.g. Explorer) while others do not (e.g. the Windows loader). So who is right here? Well if Microsoft employees hadn't been drawing Kool-Aid lines from the drug cart (hey, Microsoft has or at least had beer carts on their campus at one point and even Apple placed this guy on their most important stage, so...), they wouldn't have made the extremely bad decision to add a XML parser to the Windows loader, and therefore the application would most likely have started up because the Windows loader wouldn't be pointlessly walking through the application's resource directory and getting lost on basic RVAs. It's the only rational explanation I can come up with that's also HIGHly entertaining.
For more "fun" be sure to check out the Windows Portable Executable Tools for PHP, which comes with a powerful command-line tool written in PHP and PHP classes that extract information, modify, and even create files from scratch in the PE file format. I fully expect a comment like, "PHP was definitely not meant for this." Good times!
The PE file format is how all executables (i.e. EXEs) and things containing executable code (e.g. DLLs) are loaded into memory and run. It's a highly structured file format and has many features not found in other executable formats for other OSes and has many nuances and details surrounding it, some of which are intentionally obscured. One of the more complicated concepts for someone coming into the PE file format to understand is the Relative Virtual Address, or RVA for short. Microsoft defines the RVA as:
"The address of an item after it is loaded into memory, with the base address of the image file subtracted from it in an image file."
Another way of saying it is that a RVA is the offset from the base address where the file is loaded into memory. The Windows executable loader is responsible for reading a file off disk, mapping PE sections into RAM, performing relocations if necessary, and beginning code execution. This behavior will be important later.
When non-Windows loader programs interact with PE executables, the RVA is NOT the preferred mechanism to work with but rather physical file offsets. The PE file format makes translating RVAs to physical file offsets a bit complicated. Read an RVA, walk the PE sections list looking for a match to a section, get the position in the physical file, and be aware that sections may not accurately report their real sizes for efficiency - that is, a PE file generally makes assumptions about being run as a real executable in RAM.
When the Windows loader runs a program, it reads in the section list and uses the associated flags for that section to construct the appropriate VirtualAllocEx() calls. After each allocation, it loads the section data off disk into RAM. Once the entire image is loaded into RAM, it walks the PE data directories looking for various additional things to do. If the base address was adjusted, the Base Relocation Table is processed. If it is a driver or critical executable, a digital certificate is validated. One of the most well-known and popular PE sections is the Import Table, which defines additional "modules" (really just DLLs) that are required for the executable to function. So the Windows loader recursively walks down the import list until everything is properly mapped into the process space. Once everything is loaded and validated, the executable can begin running.
Or does it? Right after the Import Table comes the Resource Table. The PE file format stores all kinds of things, including per-language resources used by the application such as icons, cursors, menus, dialogs, and more. The Windows loader itself also looks for one resource in the Resource Table in particular: The application Manifest. One day, while Microsoft NT kernel module employees were huffing lines of Kool-Aid powder, they came up with the abomination known as Side-by-Side (WinSxS) and XML Manifests: "You know what - snnckk - would be a great idea? Let's throw a XML parser into the Windows loader and see what happens. Snnnckkkkk. [cough, wheeze, eyes-crossed]" "Great idea maaaaaan. This is some siiiick juice duuuuude. I'm baked out of my mind now but let's totally do that when we get back to our desks. At the moment I'm thinking about going over and licking that wall." Sigh. There's no rational reason for a full-blown XML parser to ever be included in any application loader. That's just begging for a limitless myriad of security vulnerabilities/exploits. Anywho, I digress.
The Resource Table is a structured table that defines a tree of nodes. In theory, it could have been a tree of unlimited depth but it never went past three levels deep: Type (icon, dialog, manifest), ID or name, and language. The tree itself uses offsets to refer to later levels in the tree. However, the leaf nodes use RVAs. Just so you don't have to go look up the PE/COFF specification, here's what a leaf node looks like according to the specification:
Data RVA (4 bytes)
The address of a unit of resource data in the Resource Data area.
Size (4 bytes)
The size, in bytes, of the resource data that is pointed to by the Data RVA field.
Codepage (4 bytes)
The code page that is used to decode code point values within the resource data. Typically, the code page would be the Unicode code page.
Reserved (4 bytes)
Must be 0.
For the purposes of the remainder of this discussion, you will probably need an executable to look at and at least one tool to look at the file's structure.
32_pe_data_dir_resources_external_rva_before.dat
PEview - For viewing the above file
If you look at the above PE file, it is notable that the PE resource data directory comes AFTER all of the individual items in the resource directory. This executable was a "mistake" because I assumed Windows would treat Data RVAs as RVAs and it was easier in code to generate a modified PE resource directory by writing leaf nodes before the main data directory as doing so required fewer precalculations. If you rename the file to have a .exe extension, you'll see that Windows Explorer loads and displays the application icon from the resource directory table of the executable. This means that Explorer treats RVAs as RVAs - at least as far as icons go. When attempting to run the executable, however, the Windows loader won't even start the application.
So what's going on here? It was a real head-scratcher until I ran into this StackOverflow post, which contains an answer with this structure:
typedef struct _IMAGE_RESOURCE_DATA_ENTRY { DWORD OffsetToData; DWORD Size; DWORD CodePage; DWORD Reserved; } IMAGE_RESOURCE_DATA_ENTRY, *PIMAGE_RESOURCE_DATA_ENTRY;The first item is "OffsetToData" instead of a RVA. Except the data values are RVAs, not offsets. So after much more head-scratching, this is what I figured out is going on: Windows Explorer is calculating the RVA of an icon to a file offset treating the RVA as an RVA as it should and so it finds the icon. The Windows loader, on the other hand, is looking at the RVA of the individual item it is retrieving and comparing it to the resource data directory RVA first or just subtracting a larger RVA from the smaller and thereby getting a really large positive value well outside the total image size (e.g. 15 - 20 = 0xFFFFFFFB = 4,294,967,291). Regardless, instead of just assuming an RVA is an RVA, the Windows loader is monkeying around with it as if it were a pseudo-RVA and either deciding it is invalid OR winding up way out in the middle of nowhere in the process' virtual space (or both). Since the data directory says there is a manifest, the Windows loader attempts to find it and then fails to load the application when it miscalculates its location even though it exists and is a technically valid RVA.
The conclusion is that some parts of Windows accept and correctly handle the slightly strange resource directory of this executable (e.g. Explorer) while others do not (e.g. the Windows loader). So who is right here? Well if Microsoft employees hadn't been drawing Kool-Aid lines from the drug cart (hey, Microsoft has or at least had beer carts on their campus at one point and even Apple placed this guy on their most important stage, so...), they wouldn't have made the extremely bad decision to add a XML parser to the Windows loader, and therefore the application would most likely have started up because the Windows loader wouldn't be pointlessly walking through the application's resource directory and getting lost on basic RVAs. It's the only rational explanation I can come up with that's also HIGHly entertaining.
For more "fun" be sure to check out the Windows Portable Executable Tools for PHP, which comes with a powerful command-line tool written in PHP and PHP classes that extract information, modify, and even create files from scratch in the PE file format. I fully expect a comment like, "PHP was definitely not meant for this." Good times!
Comments
Post a Comment