We have to change the return value of IsDebugPortPresent to simulate the normal fault handling logic when no active debugger is attached: 0:000> bp kernel32!UnhandledExceptionFilter 0:0
Trang 2It shows the presence of kernel32!UnhandledExceptionFilter calls Let’s open
TestDefaultDebugger.exe in WinDbg, put breakpoint on UnhandledExceptionFilter
Trang 3func-tion and trace the execufunc-tion We have to change the return value of IsDebugPortPresent
to simulate the normal fault handling logic when no active debugger is attached:
0:000> bp kernel32!UnhandledExceptionFilter
0:000> g
(fb0.1190): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling
This exception may be expected and handled
eax=00000000 ebx=00000001 ecx=0012fe70 edx=00000000 esi=00425ae8
Trang 477655a3a jne kernel32!UnhandledExceptionFilter+0×22 (776559a6) [br=0]
Next, we continue to step over using p command until we see
WerpReportExceptionInProcessContext function and step into it:
Trang 5At this point if we look at the stack trace we would see:
After that we step over again and find that the code flow returns from all
excep-tion handlers until KiUserExcepexcep-tionDispatcher funcexcep-tion raises excepexcep-tion again via
ZwRaiseException call
So it looks like the default unhandled exception filter in Vista only reports the
exception and doesn’t launch the error reporting process that displays the error box,
WerFault.exe
If we click on Debug button on the error reporting dialog to launch the
postmor-tem debugger (I have Visual Studio Just-In-Time Debugger configured in
AeDebug\Debugger registry key) and look at its parent process by using Process
Explorer for example, we would see it is WerFault.exe which in turn has svchost.exe as
its parent
Now we quit WinDbg and launch TestDefaultDebugger application again, push its
big crash button and when the error reporting dialog appears we attach another
in-stance of WinDbg to svchost.exe process hosting Windows Error Reporting Service
(wersvc.dll)
Trang 6We see the following threads:
Trang 74 Id: f8c.1b38 Suspend: 1 Teb: 7ffdb000 Unfrozen
ChildEBP RetAddr
00d3fe08 77a10850 ntdll!KiFastSystemCallRet
00d3fe0c 77a1a1b4 ntdll!NtWaitForWorkViaWorkerFactory+0xc
Next, if we look at CWerService::ReportCrashKernelMsg code we would see it calls
CWerService::ReportCrash which in turn loads faultrep.dll
71cb6f17 push dword ptr [ebp-34h]
71cb6f1a push dword ptr [ebp-2Ch]
71cb6f1d call dword ptr [wersvc!_imp GetCurrentProcessId (71cb1120)]
71cb7045 mov dword ptr [ebp-4],edi
71cb7048 push offset wersvc!`string’ (71cb711c)
71cb704d call dword ptr [wersvc!_imp LoadLibraryW (71cb1144)]
71cb7053 mov dword ptr [ebp-2Ch],eax
71cb7056 cmp eax,edi
71cb7058 je wersvc!CWerService::ReportCrash+0×52 (71cb9b47)
Trang 8wersvc!CWerService::ReportCrash+0×88:
71cb705e push offset wersvc!`string’ (71cb7100)
71cb7063 push eax
71cb7064 call dword ptr [wersvc!_imp GetProcAddress (71cb1140)]
71cb706a mov ebx,eax
0015de60 77a10690 ntdll!KiFastSystemCallRet
0015de64 77607e09 ntdll!ZwWaitForMultipleObjects+0xc
Trang 91 Id: 1bfc.894 Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr
024afbf8 77a10690 ntdll!KiFastSystemCallRet
024afbfc 77607e09 ntdll!ZwWaitForMultipleObjects+0xc
024afc98 77b6c4b7 kernel32!WaitForMultipleObjectsEx+0×11d
024afcec 74fa161a USER32!RealMsgWaitForMultipleObjectsEx+0×13c
024afd0c 74fa2cb6 DUser!CoreSC::Wait+0×59
024afd34 74fa2c55 DUser!CoreSC::WaitMessage+0×54
024afe40 75036beb comctl32!SHFusionDialogBoxIndirectParam+0×2d
024afe74 6d4a65a4 comctl32!CTaskDialog::Show+0×100
024afebc 6d4acb72 wer!IsolationAwareTaskDialogIndirect+0×64
024aff4c 6d4acc39 wer!CInitialConsentUI::InitialDlgThreadRoutine+0×369
024aff54 77603833
wer!CInitialConsentUI::Static_InitialDlgThreadRoutine+0xd
024aff60 779ea9bd kernel32!BaseThreadInitThunk+0xe
2 Id: 1bfc.1a04 Suspend: 1 Teb: 7ffdc000 Unfrozen
ChildEBP RetAddr
012bf998 77a10690 ntdll!KiFastSystemCallRet
012bf99c 77607e09 ntdll!ZwWaitForMultipleObjects+0xc
012bfa38 77b6c4b7 kernel32!WaitForMultipleObjectsEx+0×11d
012bfa8c 74fa161a USER32!RealMsgWaitForMultipleObjectsEx+0×13c
012bfaac 74fa1642 DUser!CoreSC::Wait+0×59
012bfae0 74fac442 DUser!CoreSC::xwProcessNL+0xaa
Next, we put a breakpoint on CreateProcess, push Debug button on the error
reporting dialog and upon the breakpoint hit inspect CreateProcess parameters:
0:003> asm no_code_bytes
Assembly options: no_code_bytes
Trang 10ESP points to return address, ESP+4 points to the first CreateProcess parameter
and ESP+8 points to the second parameter The thread stack now involves faultrep.dll:
Therefore it looks like calls to faultrep.dll module to report faults and launch the
postmortem debugger were moved from UnhandledExceptionFilter to WerFault.exe in
Vista
Finally, let’s go back to our UnhandledExceptionFilter function If we disassemble
it we would see that it can call kernel32!WerpLaunchAeDebug too:
77655c5f push dword ptr [ebp-28h]
77655c62 push dword ptr [ebp-1Ch]
77655c65 push dword ptr [ebx+4]
77655c68 push dword ptr [ebx]
Trang 1177655c6a push 0FFFFFFFEh
77655c6c call kernel32!GetCurrentProcess (775e9145)
77655c92 mov eax,dword ptr [ebx]
77655c94 push dword ptr [eax]
77655c96 push 0FFFFFFFFh
77655c98 call dword ptr [kernel32!_imp NtTerminateProcess (775c14bc)]
If we look at WerpLaunchAeDebug code we would see that it calls CreateProcess
too and the code is the same as in faultrep.dll This could mean that faultrep.dll imports
that function from kernel32.dll Therefore some postmortem debugger launching code
is still present in the default unhandled exception filter perhaps for compatibility or in
case WER doesn’t work or disabled
High-level description of the differences between Windows XP and Vista
applica-tion crash support can be found in the following Mark Russinovich’s article:
Inside the Windows Vista Kernel: Part 3 (Enhanced Crash Support)
(http://www.microsoft.com/technet/technetmag/issues/2007/04/VistaKernel/)
Trang 12ANOTHER LOOK AT PAGE FAULTS
Recently observed this bugcheck with reported “valid” address (in bold):
DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address
at an
interrupt request level (IRQL) that is too high This is usually
caused by drivers using improper addresses
If kernel debugger is available get stack backtrace
Arguments:
Arg1: e16623fc, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: ae2b222e, address which referenced memory
TRAP_FRAME: a54a4a40 (.trap 0xffffffffa54a4a40)
Pool page e16623fc region is Paged pool
e1662000 size: 3a8 previous size: 0 (Allocated) NtfF
e16623a8 size: 10 previous size: 3a8 (Free) …
e16623b8 size: 28 previous size: 10 (Allocated) Ntfo
e16623e0 size: 8 previous size: 28 (Free) CMDa
*e16623e8 size: 20 previous size: 8 (Allocated) *DRV
So why do we have the bugcheck here if the memory wasn’t paged out? This is
because page faults occur when pages are marked as invalid in page tables and not only
when they are paged out to a disk We can check whether an address belongs to an
invalid page by using !pte command:
Trang 131: kd> !pte e16623fc
VA e16623fc
PDE at 00000000C0603858 PTE at 00000000C070B310
contains 00000000F5434863 contains 00000000E817A8C2
pfn f5434 -DA KWEV not valid
We see that 0th (Valid) bit is cleared and this means that PTE marks the page as
invalid and also 11th bit (Transition) is set which marks that page as on standby or
mod-ified lists When referenced and IRQL is less than 2 the page will be made valid and
added to a process working set We see the address as “valid” in WinDbg because that
page was not paged out and present in a crash dump But it is marked as invalid and
therefore triggers the page fault Page fault handler sees that IRQL == 2 and generates
D1 bugcheck
Trang 15BUGCHECKS DEPICTED
NMI_HARDWARE_FAILURE
WinDbg help states that NMI_HARDWARE_FAILURE (0×80) bugcheck indicates a
hardware fault This description can easily lead to a conclusion that a kernel or a
com-plete crash dump we just got from our customer doesn’t worth examining
But hardware malfunction is not always the case especially if our customer mentions
that their system was hanging and they forced a manual dump Here I would advise to
check whether they have a special hardware for debugging purposes, for example, a
card or an integrated iLO chip (Integrated Lights-Out) for remote server
administra-tion Both can generate NMI (Non Maskable Interrupt) on demand and
there-fore bugcheck the system If this is the case then it is worth examining their dump to see
why the system was hanging
Trang 16IRQL_NOT_LESS_OR_EQUAL
During kernel debugging training I provided in the past I came up to the idea of
using UML sequence diagrams to depict various Windows kernel behavior including
bug-checks I started with bugcheck A To understand why this bugcheck is needed I started
explaining the difference between thread scheduling and IRQL and I used the following
diagram to illustrate it:
IRQL:=5
interrupt B
Thread 2 Thread 1
Trang 17Then I explained interrupt masking:
IRQL=0 DIRQL=10 DIRQL=5 (<=10)
Device A ISR A Device B ISR B
Exit: pending unmasked interrupts?, No
Trang 18Next I explained thread scheduling (thread dispatcher):
Clock Interrupt IRQL:=CLOCK
if quantum has expired request software dispatch interrupt
Dispatcher IRQL=2
IRQL:=DISPATCH_LEVEL(2) Exit: pending unmasked interrupts? Yes
switch thread context IRQL:=0
Clock Interrupt KeRaiseIrql(DISPATCH_LEVEL)
KeLowerIrql(0): pending unmasked interrupts? Yes
IRQL:=CLOCK
working with shared data
if quantum has expired request software dispatch interrupt
Exit: pending unmasked interrupts? No Dispatch interrupt (masked)
switch thread context IRQL:=0
Dispatch interrupt (masked)
IRQL:=DISPATCH_LEVEL(2)
Kernel
Thread scheduling and DISPATCH_LEVEL
IRQL:=DISPATCH_LEVEL(2)
Trang 19And finally I presented the diagram showing why bugcheck A happens and what
would have happened if it doesn’t exist:
IRQL=0 DIRQL=CLOCK (>2)
Dispatcher IRQL=2
Clock Interrupt KeRaiseIrql(DISPATCH_LEVEL)
IRQL:=CLOCK
if quantum has expired request software dispatch interrupt
Dispatch interrupt (masked)
KernelBugcheck A (IRQL_NOT_LESS_OR_EQUAL)
RtlQueryRegistryValues()
Registry data is paged out (page fault)
Wait for disk I/O completion
Exit: pending unmasked interrupts? no
CM
MM/CC
This would be a deadlock because
we never finish waiting Thread scheduling is disabled when we are at DISPATCH_LEVEL
IRQL:=DISPATCH_LEVEL(2) Page Fault
Trap Handler
IRQL >= 2? Yes
Bugcheck A
Trang 20This bugcheck happens in the trap handler and IRQL checking before bugcheck
happens in memory manager as you can see from the dump example below There is no
IRQL checking in disassembled handler so it must be in one of Mm functions:
8046b189 call dword ptr [nt!_imp KeGetCurrentIrql (8040063c)]
8046b18f lock inc dword ptr [nt!KiHardwareTrigger (80470cc0)]
8046b196 mov ecx,[ebp+0×64]
8046b199 and ecx,0×2
8046b19c shr ecx,1
8046b19e mov esi,[ebp+0×68]
8046b1a1 push esi
8046b1a2 push ecx
8046b1a3 push eax
8046b1a4 push edi
8046b1a5 push 0xa
8046b1a7 call nt!KeBugCheckEx (8042c1e2)
Trang 21KERNEL_MODE_EXCEPTION_NOT_HANDLED
Here is the next depicted bugcheck: 0×8E It is very common in kernel crash
dumps and it means that:
1 If an access violation exception happened the read or write address was in user
space
2 Frame-based exception handling was allowed, a kernel debugger (if any) didn’t
handle the exception (first chance), then no exception handlers were willing to process the exception and at last the kernel debugger (if any) didn’t handle the exception (second chance)
3 Frame-based exception handling wasn’t allowed and a kernel debugger (if any)
didn’t handle the exception
Trang 22The second option is depicted on the following UML sequence diagram:
PreviousMode == KernelMode? Yes
Is frame-based exception handling allowed? Yes[nt!KiDebugRoutine](FirstChance)
didn't handle
nt!RtlDispatchException
Search for handlers and call them
Handled? No[nt!KiDebugRoutine](SecondChance)
didn't handlent!KeBugCheckEx
KERNEL_MODE_EXCEPTION_NOT_HANDLED (0x8E)nt!KdpStub or nt!KdpTrap
Bugcheck 8E
Note: if we have an access violation and read or write address is in kernel space
we get a different bugcheck as explained in Invalid Pointer pattern (page 267)
Trang 23
KMODE_EXCEPTION_NOT_HANDLED
This bugcheck (0×1E) is essentially the same as KERNEL_MODE_EXCEPTION_NOT
_HANDLED (0×8E) bugcheck (page 141) although parameters are different:
KMODE_EXCEPTION_NOT_HANDLED (1e)
This is a very common bugcheck Usually the exception address pinpoints
the driver/function that caused the problem Always note this address as
well as the link date of the driver/image that contains this address
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8046ce72, The address that the exception occurred at
Arg3: 00000000, Parameter 0 of the exception
Arg4: 00000000, Parameter 1 of the exception
KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck Usually the exception address pinpoints
the driver/function that caused the problem Always note this address as
well as the link date of the driver/image that contains this address Some
common problems are exception code 0×80000003 This means a hard coded
breakpoint or assertion was hit, but this system was booted /NODEBUG This
is not supposed to happen as developers should never have hardcoded
breakpoints in retail code, but … If this happens, make sure a debugger
gets connected, and the system is booted /DEBUG This will let us see why
this breakpoint is happening
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 808cbb8d, The address that the exception occurred at
Arg3: f5a84638, Trap Frame
Arg4: 00000000
Bugcheck 0×1E is called from the same routine KiDispatchException on
x64 Windows Server 2003 and on x86 Windows 2000 platforms whereas 0×8E is called
on x86 Windows Server 2003 and Vista platforms
Trang 24SYSTEM_THREAD_EXCEPTION_NOT_HANDLED
Another bugcheck that is similar to KMODE_EXCEPTION_NOT_HANDLED and
KERNEL_MODE_EXCEPTION_NOT_HANDLED is SYSTEM_THREAD_EXCEPTION_NOT_
HANDLED (0×7E)
This bugcheck happens when you have an exception in a system thread and there
is no exception handler to catch it, i.e no try/ except handler System threads are
created by calling PsCreateSystemThread function Here is its description from DDK:
The PsCreateSystemThread routine creates a system thread that executes in
ker-nel mode and returns a handle for the thread
By default PspUnhandledExceptionInSystemThread function is set as a default
exception handler and its purpose is to call KeBugCheckEx
The typical call stack in dumps with 7E bugcheck is:
To see how this bugcheck is generated from processor trap we need to look at
raw stack Let’s look at some example !analyze -v command gives us the following
out-put:
SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck Usually the exception address pinpoints
the driver/function that caused the problem Always note this address as
well as the link date of the driver/image that contains this address
Arguments:
Arg1: 80000003, The exception code that was not handled
Arg2: f69d9dd7, The address that the exception occurred at
Arg3: f70708c0, Exception Record Address
Arg4: f70705bc, Context Record Address