1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Memory Dump Analysis Anthology- P5 doc

30 395 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Interrupts And Exceptions Explained
Trường học University of Information Technology
Chuyên ngành Computer Science
Thể loại Bài báo
Định dạng
Số trang 30
Dung lượng 643,14 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

We have to change the return value of IsDebugPortPresent to simulate the normal fault handling logic when no active debugger is attached: 0:000> bp kernel32!UnhandledExceptionFilter 0:0

Trang 2

It shows the presence of kernel32!UnhandledExceptionFilter calls Let’s open

TestDefaultDebugger.exe in WinDbg, put breakpoint on UnhandledExceptionFilter

Trang 3

func-tion and trace the execufunc-tion We have to change the return value of IsDebugPortPresent

to simulate the normal fault handling logic when no active debugger is attached:

0:000> bp kernel32!UnhandledExceptionFilter

0:000> g

(fb0.1190): Access violation - code c0000005 (first chance)

First chance exceptions are reported before any exception handling

This exception may be expected and handled

eax=00000000 ebx=00000001 ecx=0012fe70 edx=00000000 esi=00425ae8

Trang 4

77655a3a jne kernel32!UnhandledExceptionFilter+0×22 (776559a6) [br=0]

Next, we continue to step over using p command until we see

WerpReportExceptionInProcessContext function and step into it:

Trang 5

At this point if we look at the stack trace we would see:

After that we step over again and find that the code flow returns from all

excep-tion handlers until KiUserExcepexcep-tionDispatcher funcexcep-tion raises excepexcep-tion again via

ZwRaiseException call

So it looks like the default unhandled exception filter in Vista only reports the

exception and doesn’t launch the error reporting process that displays the error box,

WerFault.exe

If we click on Debug button on the error reporting dialog to launch the

postmor-tem debugger (I have Visual Studio Just-In-Time Debugger configured in

AeDebug\Debugger registry key) and look at its parent process by using Process

Explorer for example, we would see it is WerFault.exe which in turn has svchost.exe as

its parent

Now we quit WinDbg and launch TestDefaultDebugger application again, push its

big crash button and when the error reporting dialog appears we attach another

in-stance of WinDbg to svchost.exe process hosting Windows Error Reporting Service

(wersvc.dll)

Trang 6

We see the following threads:

Trang 7

4 Id: f8c.1b38 Suspend: 1 Teb: 7ffdb000 Unfrozen

ChildEBP RetAddr

00d3fe08 77a10850 ntdll!KiFastSystemCallRet

00d3fe0c 77a1a1b4 ntdll!NtWaitForWorkViaWorkerFactory+0xc

Next, if we look at CWerService::ReportCrashKernelMsg code we would see it calls

CWerService::ReportCrash which in turn loads faultrep.dll

71cb6f17 push dword ptr [ebp-34h]

71cb6f1a push dword ptr [ebp-2Ch]

71cb6f1d call dword ptr [wersvc!_imp GetCurrentProcessId (71cb1120)]

71cb7045 mov dword ptr [ebp-4],edi

71cb7048 push offset wersvc!`string’ (71cb711c)

71cb704d call dword ptr [wersvc!_imp LoadLibraryW (71cb1144)]

71cb7053 mov dword ptr [ebp-2Ch],eax

71cb7056 cmp eax,edi

71cb7058 je wersvc!CWerService::ReportCrash+0×52 (71cb9b47)

Trang 8

wersvc!CWerService::ReportCrash+0×88:

71cb705e push offset wersvc!`string’ (71cb7100)

71cb7063 push eax

71cb7064 call dword ptr [wersvc!_imp GetProcAddress (71cb1140)]

71cb706a mov ebx,eax

0015de60 77a10690 ntdll!KiFastSystemCallRet

0015de64 77607e09 ntdll!ZwWaitForMultipleObjects+0xc

Trang 9

1 Id: 1bfc.894 Suspend: 1 Teb: 7ffde000 Unfrozen

ChildEBP RetAddr

024afbf8 77a10690 ntdll!KiFastSystemCallRet

024afbfc 77607e09 ntdll!ZwWaitForMultipleObjects+0xc

024afc98 77b6c4b7 kernel32!WaitForMultipleObjectsEx+0×11d

024afcec 74fa161a USER32!RealMsgWaitForMultipleObjectsEx+0×13c

024afd0c 74fa2cb6 DUser!CoreSC::Wait+0×59

024afd34 74fa2c55 DUser!CoreSC::WaitMessage+0×54

024afe40 75036beb comctl32!SHFusionDialogBoxIndirectParam+0×2d

024afe74 6d4a65a4 comctl32!CTaskDialog::Show+0×100

024afebc 6d4acb72 wer!IsolationAwareTaskDialogIndirect+0×64

024aff4c 6d4acc39 wer!CInitialConsentUI::InitialDlgThreadRoutine+0×369

024aff54 77603833

wer!CInitialConsentUI::Static_InitialDlgThreadRoutine+0xd

024aff60 779ea9bd kernel32!BaseThreadInitThunk+0xe

2 Id: 1bfc.1a04 Suspend: 1 Teb: 7ffdc000 Unfrozen

ChildEBP RetAddr

012bf998 77a10690 ntdll!KiFastSystemCallRet

012bf99c 77607e09 ntdll!ZwWaitForMultipleObjects+0xc

012bfa38 77b6c4b7 kernel32!WaitForMultipleObjectsEx+0×11d

012bfa8c 74fa161a USER32!RealMsgWaitForMultipleObjectsEx+0×13c

012bfaac 74fa1642 DUser!CoreSC::Wait+0×59

012bfae0 74fac442 DUser!CoreSC::xwProcessNL+0xaa

Next, we put a breakpoint on CreateProcess, push Debug button on the error

reporting dialog and upon the breakpoint hit inspect CreateProcess parameters:

0:003> asm no_code_bytes

Assembly options: no_code_bytes

Trang 10

ESP points to return address, ESP+4 points to the first CreateProcess parameter

and ESP+8 points to the second parameter The thread stack now involves faultrep.dll:

Therefore it looks like calls to faultrep.dll module to report faults and launch the

postmortem debugger were moved from UnhandledExceptionFilter to WerFault.exe in

Vista

Finally, let’s go back to our UnhandledExceptionFilter function If we disassemble

it we would see that it can call kernel32!WerpLaunchAeDebug too:

77655c5f push dword ptr [ebp-28h]

77655c62 push dword ptr [ebp-1Ch]

77655c65 push dword ptr [ebx+4]

77655c68 push dword ptr [ebx]

Trang 11

77655c6a push 0FFFFFFFEh

77655c6c call kernel32!GetCurrentProcess (775e9145)

77655c92 mov eax,dword ptr [ebx]

77655c94 push dword ptr [eax]

77655c96 push 0FFFFFFFFh

77655c98 call dword ptr [kernel32!_imp NtTerminateProcess (775c14bc)]

If we look at WerpLaunchAeDebug code we would see that it calls CreateProcess

too and the code is the same as in faultrep.dll This could mean that faultrep.dll imports

that function from kernel32.dll Therefore some postmortem debugger launching code

is still present in the default unhandled exception filter perhaps for compatibility or in

case WER doesn’t work or disabled

High-level description of the differences between Windows XP and Vista

applica-tion crash support can be found in the following Mark Russinovich’s article:

Inside the Windows Vista Kernel: Part 3 (Enhanced Crash Support)

(http://www.microsoft.com/technet/technetmag/issues/2007/04/VistaKernel/)

Trang 12

ANOTHER LOOK AT PAGE FAULTS

Recently observed this bugcheck with reported “valid” address (in bold):

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)

An attempt was made to access a pageable (or completely invalid) address

at an

interrupt request level (IRQL) that is too high This is usually

caused by drivers using improper addresses

If kernel debugger is available get stack backtrace

Arguments:

Arg1: e16623fc, memory referenced

Arg2: 00000002, IRQL

Arg3: 00000000, value 0 = read operation, 1 = write operation

Arg4: ae2b222e, address which referenced memory

TRAP_FRAME: a54a4a40 (.trap 0xffffffffa54a4a40)

Pool page e16623fc region is Paged pool

e1662000 size: 3a8 previous size: 0 (Allocated) NtfF

e16623a8 size: 10 previous size: 3a8 (Free) …

e16623b8 size: 28 previous size: 10 (Allocated) Ntfo

e16623e0 size: 8 previous size: 28 (Free) CMDa

*e16623e8 size: 20 previous size: 8 (Allocated) *DRV

So why do we have the bugcheck here if the memory wasn’t paged out? This is

because page faults occur when pages are marked as invalid in page tables and not only

when they are paged out to a disk We can check whether an address belongs to an

invalid page by using !pte command:

Trang 13

1: kd> !pte e16623fc

VA e16623fc

PDE at 00000000C0603858 PTE at 00000000C070B310

contains 00000000F5434863 contains 00000000E817A8C2

pfn f5434 -DA KWEV not valid

We see that 0th (Valid) bit is cleared and this means that PTE marks the page as

invalid and also 11th bit (Transition) is set which marks that page as on standby or

mod-ified lists When referenced and IRQL is less than 2 the page will be made valid and

added to a process working set We see the address as “valid” in WinDbg because that

page was not paged out and present in a crash dump But it is marked as invalid and

therefore triggers the page fault Page fault handler sees that IRQL == 2 and generates

D1 bugcheck

Trang 15

BUGCHECKS DEPICTED

NMI_HARDWARE_FAILURE

WinDbg help states that NMI_HARDWARE_FAILURE (0×80) bugcheck indicates a

hardware fault This description can easily lead to a conclusion that a kernel or a

com-plete crash dump we just got from our customer doesn’t worth examining

But hardware malfunction is not always the case especially if our customer mentions

that their system was hanging and they forced a manual dump Here I would advise to

check whether they have a special hardware for debugging purposes, for example, a

card or an integrated iLO chip (Integrated Lights-Out) for remote server

administra-tion Both can generate NMI (Non Maskable Interrupt) on demand and

there-fore bugcheck the system If this is the case then it is worth examining their dump to see

why the system was hanging

Trang 16

IRQL_NOT_LESS_OR_EQUAL

During kernel debugging training I provided in the past I came up to the idea of

using UML sequence diagrams to depict various Windows kernel behavior including

bug-checks I started with bugcheck A To understand why this bugcheck is needed I started

explaining the difference between thread scheduling and IRQL and I used the following

diagram to illustrate it:

IRQL:=5

interrupt B

Thread 2 Thread 1

Trang 17

Then I explained interrupt masking:

IRQL=0 DIRQL=10 DIRQL=5 (<=10)

Device A ISR A Device B ISR B

Exit: pending unmasked interrupts?, No

Trang 18

Next I explained thread scheduling (thread dispatcher):

Clock Interrupt IRQL:=CLOCK

if quantum has expired request software dispatch interrupt

Dispatcher IRQL=2

IRQL:=DISPATCH_LEVEL(2) Exit: pending unmasked interrupts? Yes

switch thread context IRQL:=0

Clock Interrupt KeRaiseIrql(DISPATCH_LEVEL)

KeLowerIrql(0): pending unmasked interrupts? Yes

IRQL:=CLOCK

working with shared data

if quantum has expired request software dispatch interrupt

Exit: pending unmasked interrupts? No Dispatch interrupt (masked)

switch thread context IRQL:=0

Dispatch interrupt (masked)

IRQL:=DISPATCH_LEVEL(2)

Kernel

Thread scheduling and DISPATCH_LEVEL

IRQL:=DISPATCH_LEVEL(2)

Trang 19

And finally I presented the diagram showing why bugcheck A happens and what

would have happened if it doesn’t exist:

IRQL=0 DIRQL=CLOCK (>2)

Dispatcher IRQL=2

Clock Interrupt KeRaiseIrql(DISPATCH_LEVEL)

IRQL:=CLOCK

if quantum has expired request software dispatch interrupt

Dispatch interrupt (masked)

KernelBugcheck A (IRQL_NOT_LESS_OR_EQUAL)

RtlQueryRegistryValues()

Registry data is paged out (page fault)

Wait for disk I/O completion

Exit: pending unmasked interrupts? no

CM

MM/CC

This would be a deadlock because

we never finish waiting Thread scheduling is disabled when we are at DISPATCH_LEVEL

IRQL:=DISPATCH_LEVEL(2) Page Fault

Trap Handler

IRQL >= 2? Yes

Bugcheck A

Trang 20

This bugcheck happens in the trap handler and IRQL checking before bugcheck

happens in memory manager as you can see from the dump example below There is no

IRQL checking in disassembled handler so it must be in one of Mm functions:

8046b189 call dword ptr [nt!_imp KeGetCurrentIrql (8040063c)]

8046b18f lock inc dword ptr [nt!KiHardwareTrigger (80470cc0)]

8046b196 mov ecx,[ebp+0×64]

8046b199 and ecx,0×2

8046b19c shr ecx,1

8046b19e mov esi,[ebp+0×68]

8046b1a1 push esi

8046b1a2 push ecx

8046b1a3 push eax

8046b1a4 push edi

8046b1a5 push 0xa

8046b1a7 call nt!KeBugCheckEx (8042c1e2)

Trang 21

KERNEL_MODE_EXCEPTION_NOT_HANDLED

Here is the next depicted bugcheck: 0×8E It is very common in kernel crash

dumps and it means that:

1 If an access violation exception happened the read or write address was in user

space

2 Frame-based exception handling was allowed, a kernel debugger (if any) didn’t

handle the exception (first chance), then no exception handlers were willing to process the exception and at last the kernel debugger (if any) didn’t handle the exception (second chance)

3 Frame-based exception handling wasn’t allowed and a kernel debugger (if any)

didn’t handle the exception

Trang 22

The second option is depicted on the following UML sequence diagram:

PreviousMode == KernelMode? Yes

Is frame-based exception handling allowed? Yes[nt!KiDebugRoutine](FirstChance)

didn't handle

nt!RtlDispatchException

Search for handlers and call them

Handled? No[nt!KiDebugRoutine](SecondChance)

didn't handlent!KeBugCheckEx

KERNEL_MODE_EXCEPTION_NOT_HANDLED (0x8E)nt!KdpStub or nt!KdpTrap

Bugcheck 8E

Note: if we have an access violation and read or write address is in kernel space

we get a different bugcheck as explained in Invalid Pointer pattern (page 267)

Trang 23

KMODE_EXCEPTION_NOT_HANDLED

This bugcheck (0×1E) is essentially the same as KERNEL_MODE_EXCEPTION_NOT

_HANDLED (0×8E) bugcheck (page 141) although parameters are different:

KMODE_EXCEPTION_NOT_HANDLED (1e)

This is a very common bugcheck Usually the exception address pinpoints

the driver/function that caused the problem Always note this address as

well as the link date of the driver/image that contains this address

Arguments:

Arg1: c0000005, The exception code that was not handled

Arg2: 8046ce72, The address that the exception occurred at

Arg3: 00000000, Parameter 0 of the exception

Arg4: 00000000, Parameter 1 of the exception

KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)

This is a very common bugcheck Usually the exception address pinpoints

the driver/function that caused the problem Always note this address as

well as the link date of the driver/image that contains this address Some

common problems are exception code 0×80000003 This means a hard coded

breakpoint or assertion was hit, but this system was booted /NODEBUG This

is not supposed to happen as developers should never have hardcoded

breakpoints in retail code, but … If this happens, make sure a debugger

gets connected, and the system is booted /DEBUG This will let us see why

this breakpoint is happening

Arguments:

Arg1: c0000005, The exception code that was not handled

Arg2: 808cbb8d, The address that the exception occurred at

Arg3: f5a84638, Trap Frame

Arg4: 00000000

Bugcheck 0×1E is called from the same routine KiDispatchException on

x64 Windows Server 2003 and on x86 Windows 2000 platforms whereas 0×8E is called

on x86 Windows Server 2003 and Vista platforms

Trang 24

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED

Another bugcheck that is similar to KMODE_EXCEPTION_NOT_HANDLED and

KERNEL_MODE_EXCEPTION_NOT_HANDLED is SYSTEM_THREAD_EXCEPTION_NOT_

HANDLED (0×7E)

This bugcheck happens when you have an exception in a system thread and there

is no exception handler to catch it, i.e no try/ except handler System threads are

created by calling PsCreateSystemThread function Here is its description from DDK:

The PsCreateSystemThread routine creates a system thread that executes in

ker-nel mode and returns a handle for the thread

By default PspUnhandledExceptionInSystemThread function is set as a default

exception handler and its purpose is to call KeBugCheckEx

The typical call stack in dumps with 7E bugcheck is:

To see how this bugcheck is generated from processor trap we need to look at

raw stack Let’s look at some example !analyze -v command gives us the following

out-put:

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)

This is a very common bugcheck Usually the exception address pinpoints

the driver/function that caused the problem Always note this address as

well as the link date of the driver/image that contains this address

Arguments:

Arg1: 80000003, The exception code that was not handled

Arg2: f69d9dd7, The address that the exception occurred at

Arg3: f70708c0, Exception Record Address

Arg4: f70705bc, Context Record Address

Ngày đăng: 15/12/2013, 11:15

TỪ KHÓA LIÊN QUAN