1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Memory Dump Analysis Anthology- P17 doc

30 239 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Wait Chain (General)
Trường học University of Information Technology
Chuyên ngành Computer Science
Thể loại Thesis
Năm xuất bản 2023
Thành phố Ho Chi Minh City
Định dạng
Số trang 30
Dung lượng 711,75 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Wait Chain General 481 WAIT CHAIN GENERAL Wait Chain pattern is simply a sequence of causal relations between events: thread A is waiting for an event E to happen that threads B, C or

Trang 1

Wait Chain (General) 481

WAIT CHAIN (GENERAL)

Wait Chain pattern is simply a sequence of causal relations between events:

thread A is waiting for an event E to happen that threads B, C or D are supposed

to signal at some time in the future but they are all waiting for an event F to happen

that a thread G is about to signal as soon as it finishes processing some critical task:

This subsumes various deadlock patterns too which are causal loops where a

thread A is waiting for an event AB that a thread B will signal as soon as the thread A

signals an event BA the thread B is waiting for:

Trang 2

Thread B

Thread A

In this context “Event” means any type of synchronization object, critical section,

LPC/RPC reply or data arrival through some IPC channel and not only Win32 event

ob-ject or kernel _KEVENT

As the first example of Wait Chain pattern I show a process being terminated and

waiting for another thread to finish or in other words, considering thread termination as

an event itself, the main process thread is waiting for the second thread object to be

signaled The second thread tries to cancel previous I/O request directed to some

de-vice However that IRP is not cancellable and process hangs This can be depicted on the

following diagram:

Trang 3

Wait Chain (General) 483

Thread B (Event A)

Event B Thread A

where Thread A is our main thread waiting for Event A which is the thread B itself

wait-ing for I/O cancellation (Event B) Their stack traces are:

THREAD 8a3178d0 Cid 04bc.01cc Teb: 7ffdf000 Win32Thread: bc1b6e70 WAIT:

(Unknown) KernelMode Non-Alertable

8af2c920 Thread

Not impersonating

DeviceMap e1032530

Owning Process 89ff8d88 Image: processA.exe

Wait Start TickCount 80444 Ticks: 873 (0:00:00:13.640)

Context Switch Count 122 LargeStack

UserTime 00:00:00.015

KernelTime 00:00:00.156

Win32 Start Address 0x010148a4

Start Address 0x77e617f8

Stack Init f3f29000 Current f3f28be8 Base f3f29000 Limit f3f25000 Call 0

Priority 15 BasePriority 13 PriorityDecrement 0

ChildEBP RetAddr

f3f28c00 80833465 nt!KiSwapContext+0x26

f3f28c2c 80829a62 nt!KiSwapThread+0x2e5

f3f28c74 8094c0ea nt!KeWaitForSingleObject+0x346 ; stack trace

with arguments shows the first parameter as 8af2c920

f3f28d0c 8094c63f nt!PspExitThread+0×1f0

f3f28d24 8094c839 nt!PspTerminateThreadByPointer+0×4b

f3f28d54 8088978c nt!NtTerminateProcess+0×125

f3f28d54 7c8285ec nt!KiFastCallEntry+0xfc

Trang 4

THREAD 8af2c920 Cid 04bc.079c Teb: 7ffd7000 Win32Thread: 00000000 WAIT:

(Unknown) KernelMode Non-Alertable

Owning Process 89ff8d88 Image: processA.exe

Wait Start TickCount 81312 Ticks: 5 (0:00:00:00.078)

Context Switch Count 169 LargeStack

UserTime 00:00:00.000

KernelTime 00:00:00.000

Win32 Start Address 0×77da3ea5

Start Address 0×77e617ec

Stack Init f3e09000 Current f3e08bac Base f3e09000 Limit f3e05000 Call 0

Priority 13 BasePriority 13 PriorityDecrement 0

f3e08d4c 7c8285ec nt!KiServiceExit+0×56

By inspecting IRP we can see a device it was directed to, see that it has the cancel

bit but doesn’t have a cancel routine:

0: kd> !irp 8ad26260 1

Irp is active with 5 stacks 4 is current (= 0x8ad2633c)

No Mdl: No System Buffer: Thread 8af2c920: Irp stack trace

Trang 5

Wait Chain (General) 485

Trang 6

MANUAL DUMP (PROCESS)

Now I discuss Manual Dump pattern as seen from process memory dumps It is

not possible to reliably identify manual dumps here because a debugger or another

process dumper might have been attached to a process noninvasively and not leaving

traces of intervention so we can only rely on the following information:

Comment field

Loading Dump File [C:\kktools\userdump8.1\x64\notepad.dmp]

User Mini Dump File with Full Memory: Only application data is available

Comment: 'Userdump generated complete user-mode minidump with Standalone

function on COMPUTER-NAME'

Absence of exceptions

Loading Dump File [C:\UserDumps\notepad.dmp]

User Mini Dump File with Full Memory: Only application data is available

Symbol search path is:

srv*c:\mss*http://msdl.microsoft.com/download/symbols

Executable search path is:

Windows Vista Version 6000 MP (2 procs) Free x64

Product: WinNt, suite: SingleUserTS

Debug session time: Mon Dec 17 16:31:31.000 2007 (GMT+0)

System Uptime: 0 days 0:45:11.148

Process Uptime: 0 days 0:00:36.000

user32!ZwUserGetMessage+0xa:

00000000`76c8e6aa c3 ret

0:000> ~*kL

0 Id: 1b8.ed4 Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen

Child-SP RetAddr Call Site

Trang 7

Manual Dump (Process) 487

Wake debugger exception

Loading Dump File [C:\UserDumps\notepad2.dmp]

User Mini Dump File with Full Memory: Only application data is available

Symbol search path is:

srv*c:\mss*http://msdl.microsoft.com/download/symbols

Executable search path is:

Windows Vista Version 6000 MP (2 procs) Free x64

Product: WinNt, suite: SingleUserTS

Debug session time: Mon Dec 17 16:35:37.000 2007 (GMT+0)

System Uptime: 0 days 0:49:13.806

Process Uptime: 0 days 0:02:54.000

This dump file has an exception of interest stored in it

The stored exception information can be accessed via ecxr

(314.1b4): Wake debugger - code 80000007 (first/second chance not

available)”

user32!ZwUserGetMessage+0xa:

00000000`76c8e6aa c3 ret

Break instruction exception

Loading Dump File [C:\UserDumps\notepad3.dmp]

User Mini Dump File with Full Memory: Only application data is available

Symbol search path is:

srv*c:\mss*http://msdl.microsoft.com/download/symbols

Executable search path is:

Windows Vista Version 6000 MP (2 procs) Free x64

Product: WinNt, suite: SingleUserTS

Debug session time: Mon Dec 17 16:45:15.000 2007 (GMT+0)

System Uptime: 0 days 0:58:52.699

Process Uptime: 0 days 0:14:20.000

This dump file has an exception of interest stored in it

The stored exception information can be accessed via ecxr

ntdll!DbgBreakPoint:

00000000`76ecfdf0 cc int 3

0:001> ~*kL

0 Id: 1b8.ed4 Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen

Child-SP RetAddr Call Site

Trang 8

# 1 Id: 1b8.ec4 Suspend: 1 Teb: 000007ff`fffda000 Unfrozen

Child-SP RetAddr Call Site

00000000`030df798 00000000`76f633e8 ntdll!DbgBreakPoint

00000000`030df7a0 00000000`76d7cdcd ntdll!DbgUiRemoteBreakin+0×38

00000000`030df7d0 00000000`76ecc6e1 kernel32!BaseThreadInitThunk+0xd

00000000`030df800 00000000`00000000 ntdll!RtlUserThreadStart+0×1d

The latter might also be some assertion statement in the code leading to a

process crash like in the following instance of Dynamic Memory Corruption pattern

(heap corruption, page 257):

09aef0bc 77fb76aa ntdll!DbgBreakPoint

09aef0c4 77fa65c2 ntdll!RtlpBreakPointHeap+0×26

09aef2bc 77fb5367 ntdll!RtlAllocateHeapSlowly+0×212

09aef340 77fa64f6 ntdll!RtlDebugAllocateHeap+0xcb

09aef540 77fcc9e3 ntdll!RtlAllocateHeapSlowly+0×5a

09aef854 786f1ee4 rpcrt4!I_RpcGetBufferWithObject+0×6e

09aef860 786f1ea4 rpcrt4!I_RpcGetBuffer+0xb

09aef86c 78754762 rpcrt4!NdrGetBuffer+0×2b

09aefab8 796d78b5 rpcrt4!NdrClientCall2+0×3f9

09aefac8 796d7821 advapi32!LsarOpenPolicy2+0×14

09aefb1c 796d8b04 advapi32!LsaOpenPolicy+0xaf

09aefb84 796d8aa9 advapi32!LookupAccountSidInternal+0×63

09aefbac 0aaf5d8b advapi32!LookupAccountSidW+0×1f

WARNING: Stack unwind information not available Following frames may be

Trang 9

Manual Dump (Process) 489

Trang 10

WAIT CHAIN (CRITICAL SECTIONS)

Here is another example of Wait Chain pattern (page 481) where objects

are critical sections

WinDbg can detect them if we use !analyze -v -hang command but it detects only

one and not necessarily the longest or widest chain in cases with multiple wait chains:

Looking at threads we see this chain and we also see that the final thread

is blocked waiting for a socket (shown in smaller font for visual clarity)

ChildEBP RetAddr Args to Child

0fe2a09c 7c942124 71933a09 00000b50 00000001 ntdll!KiFastSystemCallRet

0fe2a0a0 71933a09 00000b50 00000001 0fe2a0c8 ntdll!NtWaitForSingleObject+0xc

0fe2a0dc 7194576e 00000b50 00000234 00000000 mswsock!SockWaitForSingleObject+0x19d

0fe2a154 71a12679 00000234 0fe2a1b4 00000001 mswsock!WSPRecv+0x203

0fe2a190 62985408 00000234 0fe2a1b4 00000001 WS2_32!WSARecv+0x77

0fe2a1d0 6298326b 00000234 0274ebc6 00000810 component!wait+0x338

Trang 11

Wait Chain (Critical Sections) 491

If we look at all held critical sections we would see another thread that blocked

more than 125 other threads:

Trang 12

0ff2ffec 00000000 77b9b4bc 060cf9a0 00000000 kernel32!BaseThreadStart+0×34

Searching for any thread waiting for critical section 051e4bd8 gives us:

8 Id: 8d8.924 Suspend: 1 Teb: 7ffd5000 Unfrozen

ChildEBP RetAddr Args to Child

Trang 13

Alien Component 493 PART 4: CRASH DUMP ANALYSIS ANTIPATTERNS

ALIEN COMPONENT

In any domain of activity where patterns exist we can find anti-patterns too They

are bad solutions for recurrent problems in specific contexts One of them I would like

to introduce briefly is called Alien Component In essence, when every technique fails or

we run out of WinDbg commands we look at some innocent component we have never

seen before or don’t have symbols for: be it some driver or hook Of course, this

compo-nent cannot be the compocompo-nent developed by the company we are working for

Trang 14

ZIPPOCRICY

Let’s define Zippocricy – the common sin in software support environments

worldwide: someone gets something from a customer in an archived form and without

checking the contents forwards it further to another person in support chain By the

time the evidence gets unzipped somewhere, checked and found corrupt or irrelevant

the customer suffers not hours but days

Happens not only with crash dumps but with any type of problem evidence

Trang 15

Word of Mouth 495

WORD OF MOUTH

Many engineers say, “I didn’t know about this debugging command, let’s use it!”

after a training session or reading other people’s analysis of crash dumps A

year later we hear the same phrase from them about another debugging command In

the mean time they continue to use the same set of commands they know about until

they hear the old new one

This is a manifestation of Word of Mouth anti-pattern

General solution: Know your tools Study them proactively

Example solution: periodically read and re-read WinDbg help

Trang 16

WRONG DUMP

A customer reports application.exe crashes and we ask for a dump file We get a

dump, open it and see that the dump is not from our application.exe We ask for print

spooler crash dump and we get mplayer.exe crash dump I originally thought about

call-ing it Wrong Dump pattern and place it into the patterns category but after writcall-ing

about Zippocricy (page 494) I clearly see it as an anti-pattern It is not a rocket science to

check a process name in a dump file before sending it for analysis:

Load the user process dump in WinDbg

Type command symfix; reload; !analyze -v and wait

until WinDbg is not busy analyzing Find PROCESS_NAME: in the output We get something like:

PROCESS_NAME: spoolsv.exe

We can also use dumpchk.exe from Debugging Tools for Windows

(http:/support.citrix.com/article/CTX108825)

Another example is when we ask for a complete memory dump but we get a

ker-nel dump or various mini-dumps Fortunately Citrix DumpCheck Explorer extension

can warn users before they submit a dump file

Trang 17

Fooled by Description 497

FOOLED BY DESCRIPTION

From my observation an engineer with software development background opens

a crash dump after glancing at a problem description provided by a customer or

even without reading it first Only if the problem is not immediately obvious from the

memory dump file the engineer will read the problem description thoroughly On the

contrary, an engineer with technical support or system administration background will

thoroughly read the problem description first In the latter case the description might

influence the direction of analysis

Here is an example The description says: slow application start and we have a

memory dump from a process An engineer with technical support background will most

likely look for hang patterns inside the dump An engineer with experience writing

unmanaged applications in C and C++ will open the memory dump and check an

excep-tion stored in it and if it is a breakpoint the suspicion might arise that the memory dump

was taken manually because of the hanging process Based on the analysis the engineer

might even correct the problem description or add questions that clarify the discrepancy

between what is seen in the dump and what users perceive

Trang 18

NEED THE CRASH DUMP

This is might be the first thought when an engineer gets a stack trace fragment

without symbolic information It is usually based on the following presupposition:

We need an actual dump file to suggest further troubleshooting steps

This is not actually true unless it is the first time we have the problem and a get

stack trace for it Consider the following fragment from one bugcheck kernel dump

when no symbols were applied because the customer didn’t have them:

b90529f8 8085eced nt!KeBugCheckEx+0x1b

b9052a70 8088c798 nt!MmAccessFault+0xb25

b9052a70 bfabd940 nt!_KiTrap0E+0xdc

WARNING: Stack unwind information not available Following frames may be

wrong.

b9052b14 bfabe452 MyDriver+0x27940

We can convert module+offset information into module!function+offset2 using

MAP files or using DIA SDK (Debug Interface Access SDK) to query PDB files if we know

module timestamp This might be seen as a tedious exercise but we don’t need to do it

if we keep raw stack trace signatures in some database when doing crash dump analysis

If we use our own symbol servers we might want to remove references to them and

reload symbols Then redo previous stack trace commands

In this case similar previous bugcheck crash dumps were analyzed months ago

and engineers saved stacks trace prior to applying symbols This helped to point to the

solution without requesting the crash dump corresponding to that stack trace

Trang 19

Be Language 499

BE LANGUAGE

This is about excessive use of “is” and was inspired by Alfred Korzybski notion of

how “is” affects our understanding of the world In the context of technical support the

use of certain verbs sometimes leads to wrong troubleshooting and debugging paths

For example, the following phrase:

It is our pool tag It is effected by driver A, driver B and driver C

Surely driver A, driver B and driver C were not developed by the same company

that introduced the problem pool tag (smells Alien Component here, page 493) Unless

supported by solid evidence the better phrase shall be:

It is our pool tag It might have been effected by driver A, driver B or driver C

I’m not advocating to completely eradicate “be” verbs but to be conscious in

their use

Ngày đăng: 24/12/2013, 18:15

TỪ KHÓA LIÊN QUAN

w