Tài liệu Memory Dump Analysis Anthology- P9 pdf

Case Study 241 CASE STUDY Consider the following legacy C++/Win32 code fragment highlighted in WinDbg after opening a crash dump: 1: HANDLE hFile = CreateFilestr.GetBuffer, GENERIC_READ

Trang 1

Case Study 241

CASE STUDY

Consider the following legacy C++/Win32 code fragment highlighted in WinDbg

after opening a crash dump:

1: HANDLE hFile = CreateFile(str.GetBuffer(), GENERIC_READ,

FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

13: if (bufferA[i] == 0xD && bufferA[i+1] != 0xA)

At the first glance the code seems to be right: we open a file, get its size and

allocate a buffer to read All loop indexes are within array bounds too Let’s look at

disassembly and crash point:

004021be push esi

004021bf call dword ptr [component!_imp GetFileSize (0042e26c)]

004021c5 mov edi,eax ; dwSize

004021c7 lea ebx,[edi+2] ; dwSize+2

004021ca push ebx

004021cb mov dword ptr [esp+34h],0

004021d3 call component!operator new[] (00408e35)

004021d8 push ebx

004021d9 mov ebp,eax ; bufferA

004021db push 0

004021dd push ebp

004021de call component!memset (00418500)

004021e3 add esp,10h

004021e6 push 0

004021e8 lea edx,[esp+34h]

004021ec push edx

004021ed push edi

004021ee push ebp

004021ef push esi

004021f0 call dword ptr [component!_imp ReadFile (0042e264)]

004021f6 test eax,eax

Trang 2

00402333 add edi,0FFFFFFFBh ; +2-7 (edi contains dwSize)

00402336 cmp edi,esi ; loop condition

00402338 mov dword ptr [esp+24h],esi

0040233c jbe component!CMyDlg::OnTimer+0×43e (004023be)

00402342 mov al,byte ptr [esi+ebp] ; bufferA[i]

00402342 8a042e mov al,byte ptr [esi+ebp] ds:0023:0095e000=??

If we look at ebx (dwSize+2) and edi registers (array upper bound, dwSize+2-7)

we can easily see that dwSize was zero Clearly we had buffer overrun because upper

array bound was calculated as 0+2-7 = FFFFFFFB (the loop index was unsigned integer,

DWORD) Were the index signed integer variable (int) we wouldn’t have had any

prob-lem because the condition 0 < 0+2-7 is always false and the loop body would have never

been executed

Based on that the following fix was proposed:

11: for (; i < dwSize+2-7; ++i)

11: for (; i < (int)dwSize+2-7; ++i)

12: {

Trang 3

Case Study 243

GetFileSize can return INVALID_FILE_SIZE (0xFFFFFFFF) and operator new can fail

theoretically (if the size is too big) so we can correct the code even further:

2: if (hFile != INVALID_HANDLE_VALUE)

3: {

4: DWORD dwSize = GetFileSize(hFile, NULL);

4a: if (dwSize != INVALID_FILE_SIZE)

Trang 4

DETECTING LOOPS IN CODE

Sometimes when we look at a stack trace and disassembled code we see that a

crash couldn’t have happened if the code path was linear In such cases we need to see

if there is any loop that changes some variables This is greatly simplified if we have

source code but in cases where we don’t have access to source code it is still possible to

detect loops We just need to find a direct (JMP) or conditional jump instruction (Jxxx,

for example, JE) after the crash point branching to the beginning of the loop before the

crash point as shown in the following pseudo code:

set the pointer value

Let’s look at one example I found very interesting because it also shows

thiscall calling convention for C++ code generated by Visual С++ compiler Before we

look at the dump I quickly remind you about how C++ non-static class methods are

called Let’s first look at non-virtual method call

class A

{

public:

int foo() { return i; }

virtual int bar() { return i; }

private:

int i;

};

Internally class members are accessed via implicit this pointer (passed via ECX):

int A::foo() { return this->i; }

Suppose we have an object instance of class A and we call its foo method:

A obj;

obj.foo();

Trang 5

Detecting Loops in Code 245

The compiler has to generate code which calls foo function and the code inside

the function has to know which object it is associated with So internally the compiler

passes implicit parameter - a pointer to that object In pseudo code:

int foo_impl(A *this)

In x86 assembly language it should be similar to this code:

lea ecx, obj

call foo_impl

If we have obj declared as a local variable the code is similar:

lea ecx, [ebp-N]

call foo_impl

If we have a pointer to an obj then the compiler usually generates MOV

instruc-tion instead of LEA instruction:

A *pobj;

pobj->foo();

mov ecx, [ebp-N]

call foo_impl

If we have other function parameters they are pushed on the stack from right to

left This is thiscall calling convention For virtual function call we have an indirect call

through a virtual function table The pointer to it is the first object layout member and

in the latter case where the pointer to obj is declared as the local variable we have the

following x86 code:

A *pobj;

pobj->bar();

mov ecx, [ebp-N]

mov eax, [ecx]

call [eax]

Trang 6

Now let’s look at the crash point and stack trace:

67dc5d55 push offset component!CreateErrorInfo+0x553 (67ded93b)

67dc5d5a mov eax,dword ptr fs:[00000000h]

67dc5d60 push eax

67dc5d61 mov dword ptr fs:[0],esp

67dc5d68 sub esp,240h

67dc5d6e mov eax,dword ptr [component! security_cookie (67e0113c)]

67dc5d73 mov dword ptr [ebp-10h],eax

67dc5d76 mov eax,dword ptr [ebp+8]

Trang 7

67dc5da5 mov dword ptr [ebp-240h],eax

67dc5dab push 5Ch

67dc5dad lea ecx,[ebp-244h]

67dc5db3 mov dword ptr [ebp-4],0

67dc5dba call component!CStrToken::Next (67dc4f80)

If we trace EBX backwards we would see that it comes from ECX so ECX could be

considered as an implicit this pointer according to thiscall calling convention

There-fore it looks like the caller passed NULL this pointer via ECX

Let’s look at the caller To see the code we can either disassemble FindFirstFileW

or disassemble backwards at the GetDirectory return address We’ll do the latter:

004074e0 mov ecx,dword ptr [esi+8E4h]

004074e6 mov eax,dword ptr [ecx]

004074e8 push 0

004074ea push 0

004074ec push edx

004074ed call dword ptr [eax+10h]

Trang 8

We see that ECX is our this pointer However the virtual table pointer is taken

from the memory it references:

004074e6 mov eax,dword ptr [ecx]

…

004074ed call dword ptr [eax+10h]

Were ECX a NULL we would have had our crash at this point However we

have our crash in the called function So it couldn’t be NULL There is a contradiction

here The only plausible explanation is that in GetDirectory function there is a loop that

changes EBX (shown in bold in GetDirectory function code above) If we have a second

look at the code we would see that EBX is saved in [ebp-238h] local variable before it is

67dc5d55 push offset component!CreateErrorInfo+0x553 (67ded93b)

67dc5d5a mov eax,dword ptr fs:[00000000h]

67dc5d60 push eax

67dc5d61 mov dword ptr fs:[0],esp

67dc5d68 sub esp,240h

67dc5d6e mov eax,dword ptr [component! security_cookie (67e0113c)]

67dc5d73 mov dword ptr [ebp-10h],eax

67dc5d76 mov eax,dword ptr [ebp+8]

67dc5d9f mov dword ptr [ebp-244h],eax

67dc5da5 mov dword ptr [ebp-240h],eax

67dc5dab push 5Ch

67dc5dad lea ecx,[ebp-244h]

67dc5db3 mov dword ptr [ebp-4],0

67dc5dba call component!CStrToken::Next (67dc4f80)

Trang 9

If we look further past the crash point we would see that [ebp-238h] value is

changed and then used again to change EBX:

67dc5e6e mov eax,dword ptr [ebp-23Ch]

67dc5e74 mov ecx,dword ptr [eax]

67dc5e76 mov dword ptr [ebp-238h],ecx

67dc5e7c jmp component!CDirectory::GetDirectory+0×20e (67dc5f5e)

We see that after changing EBX the code jumps to 67dc5dd0 address and this

ad-dress is just before our crash point It looks like a loop Therefore there is no

contradic-tion ECX as this pointer was passed as non-NULL and valid pointer Before the loop

started its value was passed to EBX In the loop body EBX was changed and after some

loop iterations the new value became NULL It could be the case that there were no

checks for NULL pointers in the loop code

Trang 11

Crash Dump Analysis Checklist 251

CRASH DUMP ANALYSIS CHECKLIST

Often the root cause of a problem is not obvious from a memory dump Here is

the first version of crash dump analysis checklist to help experienced engineers not to

miss any important information The check list doesn’t prescribe any specific steps, just

lists all possible points to double check when looking at a memory dump

General:

• Internal database(s) search

• Google or Microsoft search for suspected components as this could be a known issue

Sometimes a simple search immediately points to the fix on a vendor’s site

• The tool used to save a dump (to flag false positive, incomplete or inconsistent dumps)

Application crash or hang:

• Default analysis (!analyze -v or !analyze -v -hang for hangs)

• Critical sections (!locks) for both crashes and hangs

• Component timestamps DLL Hell?

• Do any newer components exist?

• Process threads (~*kv or !uniqstack)

• Process uptime

• Your components on the full raw stack of the problem thread

• Your components on the full raw stack of the main application thread

• Process size

• Number of threads

• Gflags value (!gflag)

• Time consumed by thread (!runaway)

• Environment (!peb)

• Import table (!dh)

• Hooked functions (!chkimg)

• Exception handlers (!exchain)

Trang 12

System hang:

• Default analysis (!analyze -v -hang)

• ERESOURCE contention (!locks)

• Processes and virtual memory including session space (!vm 4)

• Pools (!poolused)

• Waiting threads (!stacks)

• Critical system queues (!exqueue f)

• I/O (!irpfind)

• The list of all thread stack traces (!process 0 ff for W2K3/XP/Vista, ListProcessStacks

script for Windows 2000, see page 222)

• LPC chain for suspected threads (!lpc message)

• Critical sections for suspected processes (!ntsdexts.locks)

• Sessions, session processes (!session, !sprocess)

• Processes (size, handle table size) (!process 0 0)

• Running threads (!running)

• Ready threads (!ready)

• DPC queues (!dpcs)

• The list of APCs (!apc)

BSOD:

• Default analysis (!analyze -v)

• Pool address (!pool)

• Component timestamps

• Processes and virtual memory (!vm 4)

• Current threads on other processors

• Raw stack

• Bugcheck description (including ln exception address for corrupt or truncated dumps)

Trang 13

Crash Dump Analysis Poster (HTML version) 253

CRASH DUMP ANALYSIS POSTER (HTML VERSION)

There is an HTML version of Crash Dump Analysis Poster with hyperlinks

Com-mand links launch WinDbg Help for corresponding topic If you click on !heap, for

exam-ple, WinDbg Help window for that command will open In order to have this

functional-ity you need to save source code of the following HTML file below to your disk and

launch it locally Its link is http://www.dumpanalysis.org/CDAPoster.html or simply go to

windbg.org to locate it

Note: Your WinDbg Help file must be in the default installation path, i.e

C:\Program Files\Debugging Tools for Windows\debugger.chm

If you installed WinDbg to a different folder then you can simply create the

de-fault folder and copy debugger.chm there

I keep this HTML file open locally on a second monitor and found it very easy to

jump to an appropriate command help when I need its parameter description

Trang 15

Multiple Exceptions 255 PART 3: CRASH DUMP ANALYSIS PATTERNS

MULTIPLE EXCEPTIONS

After doing crash dump analysis for some time I decided to organize my

know-ledge into a set of patterns (so to speak in a memory dump analysis pattern language

and therefore try to facilitate its common vocabulary)

What is a pattern? It is a general solution we can apply in a specific context to a

common recurrent problem

The first pattern I’m going to introduce today is Multiple Exceptions This pattern

captures the known fact that there could be as many exceptions (”crashes”) as many

threads in a process The following UML diagram depicts the relationship between

Process, Thread and Exception entities:

Every process in Windows has at least one execution thread so there could be at

least one exception per thread (like invalid memory reference) if things go wrong There

could be second exception in that thread if exception handling code experiences

another exception or the first exception was handled and you have another one and so

on

So what is the general solution to that common problem when an application or

service crashes and we have a crash dump file (common recurrent problem) from a

cus-tomer (specific context)? The general solution is to look at all threads and their stacks

and do not rely on what tools say

Here is a concrete example from one of the dumps Internet Explorer crashed

and I opened it in WinDbg and ran !analyze -v command This is what I got in my

Trang 16

Break instruction, we might think, shows that the dump was taken manually from

the running application and there was no crash - the customer sent the wrong dump or

misunderstood troubleshooting instructions However I looked at all threads and

no-ticed the following two stacks (threads 15 and 16):

We see here that the real crash happened in componentA.dll and

compo-nentB.dll or mshtml.dll might have influenced that Why this happened? The

cus-tomer might have dumped Internet Explorer manually while it was displaying an

excep-tion message box NtRaiseHardError displays a message box containing an error

mes-sage

Perhaps something else happened Many cases where we see multiple thread

ex-ceptions in one process dump happened because crashed threads displayed message

boxes like Visual C++ debug message box and preventing that process from

termina-tion In our dump under discussion WinDbg automatic analysis command recognized

only the last breakpoint exception (shown as # 16) In conclusion we shouldn’t rely

on ”automatic analysis” often anyway

Trang 17

Dynamic Memory Corruption 257

DYNAMIC MEMORY CORRUPTION

Next pattern I would like to discuss is Dynamic Memory Corruption (and its user

and kernel variants called Heap Corruption and Pool Corruption) It is so ubiquitous and

its manifestations are random and usually crashes happen far away from the original

corruption point In our user mode and space part of exception threads (don’t forget

about Multiple Exceptions pattern, page 255) you would see something like this:

or any similar variants and we need to know exact component that corrupted the

application heap (which usually is not the same as componentA.dll we see in the

crashed thread stack)

For this common recurrent problem we have a general solution: enable heap

checking This general solution has many variants applied in a specific context:

parameter value checking for heap functions

user space software heap checks before or after certain checkpoints (like

“malloc”/”new” and/or “free”/”delete” calls): usually implemented by checking various fill patterns, etc

hardware/OS supported heap checks (like using guard and nonaccessible pages

to trap buffer overruns)

The latter variant is the mostly used according to my experience and mainly due

to the fact that most heap corruptions originate from buffer overflows And it is easier

to rely on instant MMU support than on checking fill patterns Here is the article from

Citrix support web site describing how we can enable full page heap It uses specific

process as an example: Citrix Independent Management Architecture (IMA) service but

we can substitute any application name we are interested in debugging:

Tiêu đề	Memory Dump Analysis Anthology
Trường học	University of XYZ
Chuyên ngành	Computer Security and Forensics
Thể loại	Thesis
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	30
Dung lượng	864,22 KB