Sunday, July 5, 2009

The Portable Executable

Introduction

MICROSOFT INTRODUCED A NEW executable file format with Windows NT. This format is called the Portable Executable (PE) format because it is supposed to be portable across all 32-bit operating systems by Microsoft. The same PE format executable can be executed on any version of Windows NT, Windows 95, and Win32s. Also, the same format is used for executables for Windows NT running on processors other than Intel x86, such as MIPS, Alpha, and Power PC. The 32-bit DLLs and Windows NT device drivers also follow the same PE format.

It is helpful to understand the PE file format because PE files are almost identical on disk and in RAM. Learning about the PE format is also helpful for understanding many operating system concepts. For example, how operating system loader works to support dynamic linking of DLL functions, the data structures involved in dynamic linking such as import table, export table, and so on.

The PE format is not really undocumented. The WINNT.H file has several structure definitions representing the PE format. The Microsoft Developer's Network (MSDN) CD-ROMs contain several descriptions of the PE format. However, these descriptions are in bits and pieces, and are by no means complete.

Microsoft also provides a DLL with the SDK that has utility functions for interpreting PE files. We also discuss these functions and correlate them with other information about the PE format.


OVERVIEW OF A PE FILE

In this section, we discuss the overall structure of a PE file. In the sections that follow, we go into detail about the PE format. A PE file comprises various sections. Because Microsoft’s 32-bit operating systems follow the flat memory model, an executable no longer contains segments. Still, different parts of an executable, such as code and data, have different characteristics. These different parts of an executable are stored as different sections. Thus, a PE file is a concatenation of data stored in sections.

A few sections are always present in a PE file generated by the Microsoft linker. Other linkers may generate similar sections with different names. A PE file generated with the Microsoft linker has a .text section that contains the code bytes concatenated from all the object files. As for the data, it can be classified into different categories. The .data section contains all the initialized global and static data, while the .bss section contains the uninitialized data. The read-only data, such as string literals and constants, is stored in the .rdata section. This section also contains some other read-only structures, such as the debug directory, the Thread Local Storage (TLS) directory, and so on. The .edata section contains information about the functions exported from a DLL, while the .idata section stores information about the functions imported by an executable or a DLL. The .rsrc section contains various resources, such as menus and dialog boxes. The .reloc section stores the information required for relocating the image while loading.

The names of the sections do not have any significance. As mentioned earlier, different linkers may use different names for the sections. Programmers can also create new sections of their own. The #pragma code_seg and #pragma data_seg macros can be used to create new sections while working with Microsoft compiler. The operating system loader locates the required piece of information from the data directories present in the file headers. Shortly, we will present an overview of file headers and then look at them in more detail.


STRUCTURE OF A PE FILE

Apart from the sections consisting of the actual data, a PE file contains various headers that describe the sections and the important information present in the sections.

If you look at the hex dump of a PE file, the first 2 bytes might look familiar. Aren’t they M and Z? Yes, a PE file starts with the DOS executable header. It is followed by a small program that prints an error message saying that the program cannot be run in DOS mode. It’s the same idea that was used in 16-bit Windows executables. This program code is executed, if the PE image is run under DOS.

After the DOS header and the DOS executable stub comes the PE header. A field in the DOS header points to this new header. The PE header starts with the 4-byte signature “PE” followed by two nulls. The PE format is based on the Common Object File Format (COFF) used by Unix. The PE signature is followed by the object file header borrowed from COFF. This header is present also for the object files produced by Microsoft’s 32-bit compilers. This header contains some general information about the file, such as the target machine ID, the number of sections in the file, and so forth. The COFF style header is followed by the optional header. This header is optional in the sense that it is not required for the object files. As far as executables and DLLs are concerned, this header is mandatory. The optional header has two parts. The first part is inherited from COFF and can be found in all COFF files. The second part is an NT-specific extension of COFF. Apart from other NT-specific information, such as the subsystem type, this part also contains the data directory. The data directory is an array in which each entry points to some important piece of information. One of the entries in the data directory points to the import table of the executable or DLL, another entry points to the export table of the DLL, and so on.

The data directory is followed by the section table. The section table is an array of section headers. A section header summarizes the important information about the respective section. Finally, the section table is followed by the sections themselves.

We hope that this gives you an overview of the organization of a PE file. Before diving into the details of the PE format, let’s discuss a concept that is vital in interpreting a PE file.


RELATIVE VIRTUAL ADDRESS

All the offsets within a PE file are denoted as Relative Virtual Addresses (RVAs). An RVA is an offset from the base address at which an executable is loaded in memory. This is not the same as the offset within the file because of the section alignment requirements. The PE header specifies the section alignment requirements for an executable image. A section has to be loaded at a memory address that is a multiple of the section alignment. The section alignment has to be a multiple of the page size. This is because different sections have different page attribute requirements; for example, the .data section needs read-write permissions, while the .text section needs read-execute permissions. Hence, a page cannot span section boundaries.

Because the PE format always talks in terms of RVAs, it’s difficult to find the location of the required information within a file. A common practice while accessing a PE file is to map the file in memory using the Win32 memory mapping API. It’s a bit complicated to calculate the address for the given RVA in this memory-mapped file. You first need to find out the section in which the given RVA lies. You can accomplish this by iterating through the section table. Each section header stores the starting RVA for the section and the size of the section. A section is guaranteed to be contiguously loaded in memory. Hence, the offset from the start of the section for a particular piece of data is bound to be the same whether the file is memory mapped or loaded by the operating system loader for execution. Hence, to find out the address in a memory-mapped file, you simply need to add this offset to the base address of the section in the memory-mapped file. Now, this base address can be calculated from within the file offset of the section, which is also stored in the respective section header. Quite an easy procedure, isn’t it?

Undocumented 32 bit Winnt API's

Undocumented 32 bit Winnt API's

ImageRvaToVa()

Description:

LPVOID ImageRvaToVa(

PIMAGE_NT_HEADERS NtHeaders,

LPVOID Base,

DWORD Rva,

PIMAGE_SECTION_HEADER *LastRvaSection

);
PARAMETERS
NtHeadersPointer to an IMAGE_NT_HEADERS structure. This structure represents the PE header and is defined in the WINNT.h file. A pointer to the PE header within a PE file can be obtained using the ImageNtHeader() function exported by IMAGEHLP.DLL.
BaseBase address where the PE file is mapped into memory using the Win32 API for the memory mapping of files.
RvaGiven relative virtual address.
LastRvaSectionLast RVA section. This is an optional parameter, and you can pass NULL. When specified, it points to a variable that contains the last section value used for the specified image to translate an RVA to a VA. This is used for optimizing the section search, in case the given RVA also falls within the same section as the one for the previous call to the function. The LastRVASection is checked first, and the regular sequential search for the section is carried out only if the given RVA does not fall within the LastRVASection.

RETURN VALUES
If the function succeeds, the return value is the virtual address in the mapped file; otherwise, it is NULL. The error number can be retrieved using the GetLastError() function.

ImageNtHeader()


The ImageRvaToVa() function needs a pointer to the PE header. The ImageNtHeader exported from the IMAGEHLP.DLL can provide you this pointer.
PIMAGE_NT_HEADERS ImageNtHeader(

LPVOID ImageBase

);
PARAMETERS
ImageBaseBase address where the PE file is mapped into memory using the Win32 API for the memory mapping of files.

RETURN VALUES
If the function succeeds, the return value is a pointer to the IMAGE_NT_HEADERS structure within the mapped file; otherwise, it returns NULL.

MapAndLoad()


The IMAGEHLP.DLL can also take care of memory mapping a PE file for you. The MapAndLoad() function maps the requested PE file in memory and fills in the LOADED_IMAGE structure with some useful information about the mapped file.
BOOL MapAndLoad(

LPSTR ImageName,

LPSTR DllPath,

PLOADED_IMAGE LoadedImage,

BOOL DotDll,

BOOL ReadOnly

);
PARAMETERS
ImageNameName of the PE file that is loaded.
DllPathPath used to locate the file if the name provided cannot be found. If NULL is passed, then normal rules for searching using the PATH environment variable are applied.
LoadedImageThe structure LOADED_IMAGE is defined in the IMAGEHLP.H file. The structure has the following members:
ModuleNameName of the loaded file.
hFileHandle obtained through the call to CreateFile.
MappedAddressMemory address where the file is mapped.
FileHeaderPointer to the PE header within the mapped file.
LastRvaSectionThe function sets it to the first section (see ImageRvaToVa).
NumberOfSectionsNumber of sections in the loaded PE file.
SectionsPointer to the first section header within the mapped file.
CharacteristicsCharacteristics of the PE file (this is explained in more detail later in this chapter).
fSystemImageFlag indicating whether it is a kernel-mode driver/DLL.
fDOSImageFlag indicating whether it is a DOS executable.
LinksList of loaded images.
SizeOfImageSize of the image.

The function sets the members in the structure appropriately after loading the PE file.

DotDllIf the file needs to be searched and does not have an extension, then either the .exe or the .dll extension is used. If the DotDll flag is set to TRUE, the .dll extension is used; otherwise, the .exe extension is used.
ReadOnlyIf the flag is set to TRUE, the file is mapped as read-only.