!dml_proc Here we can click on a process link and get the list of threads: We can click either on “Full details” link or on an individual thread link to see its call stack.. Another le
Trang 1SUSPENDING THREADS
Suspending threads during live kernel debugging session can be useful for
debug-ging or reproducing race condition issues For example, when we have one thread that
depends on another thread finishing its work earlier Sometimes, very rarely the
lat-ter thread finishes aflat-ter the moment the first thread would expect it In order to
model this race condition we can simply patch the prologue code of the second thread
worker function with ret instruction This has the same effect as suspending the thread
so it cannot produce the required data
Note: ~n (suspend) and ~f (freeze) are for user mode live debugging only
Trang 2HEAP STACK TRACES
If we have user mode stack trace DB enabled on Windows 2003 Server for some
service or application and we get a crash dump and try to get saved stack traces using
!heap extension command we might get these errors:
0:000> !heap -k -h 000a0000
Heap entries for Segment00 in Heap 000a0000
000a0c50: 00c50 00040 [01] - busy (40)
000a0c90: 00040 01818 [07] - busy (1800), tail fill - unable to
read heap entry extra at 000a24a0
000a24a8: 01818 00030 [07] - busy (18), tail fill - unable to
read heap entry extra at 000a24d0
000a24d8: 00030 005a0 [07] - busy (588), tail fill - unable to
read heap entry extra at 000a2a70
The solution is to use old Windows 2000 extension ntsdexts.dll:
Trang 3HYPERTEXT COMMANDS
Recent versions of WinDbg have RichEdit command output window that allows
syntax highlighting and can simulate hyperlinks
Tooltip from WindowHistory shows its window class:
There is also Debugger Markup Language (DML) and new commands that take
advantage of it For documentation please look at dml.doc located in your Debugging
Tools for Windows folder
Here is the output of some commands (because WinDbg uses the variant of
RichEdit that doesn’t allow copy/paste formatting I put screenshots of the output):
Trang 4!dml_proc
Here we can click on a process link and get the list of threads:
We can click either on “Full details” link or on an individual thread link to see its
call stack If we select “user-mode state” link we switch to process context automatically
(useful for complete memory dumps):
kd> process /p /r 0x8342c128
Implicit process is now 8342c128
Loading User Symbols
Trang 5We can also navigate frames and local variables very easily:
If we click on a thread name (<No name> here) we get its context:
Clicking on a number sets the scope and shows local variables (if we have full PDB
files):
Similar command is kM:
Trang 6Another useful command is lmD where we can easily inspect modules:
Trang 7ANALYZING HANGS FASTER
Google search shows that the additional parameter (-hang) to the venerable
!analyze -v command is rarely used Here is the command we can use if we get a
ma-nually generated dump and there is no exception in it reported by !analyze -v and
subsequent visual inspection of ~*kv output doesn’t show anything suspicious, leading
to hidden exception(s):
!analyze -hang -v
Then we should always double check with !locks command because there could
be multiple hang conditions in a crash dump
The same parameter can be used for kernel memory dumps too But double
checking ERESOURCE locks (!locks), kernel threads (!stacks) and DPC queues (!dpcs)
manually is highly recommended
Trang 8There are cases where we need triple dereference (or even quadruple
derefe-rence) done on a range of memory Here we can utilize WinDbg scripts The key is to use
$p pseudo-register which shows the last value of d* commands (dd, dps, etc):
.for (r $t0=00000000`004015a2, $t1=4; @$t1 >= 0; r $t1=$t1-1,
$t0=$t0+$ptrsize) { dps @$t0 l1; dps $p l1; dps $p l1; printf "\n" }
where $t0 and $t1 are pseudo-registers holding the starting address of a memory block
(we use 64-bit format) and the number of objects to be triple dereferenced and
dis-played $ptrsize is a pointer size The script is platform independent (can be used on
both 32-bit and 64-bit target) For example:
Trang 9If we want quadruple dereferenced memory we just need to add the additional
dps @$t0 l1; to for loop body With this script even double dereference looks much
better because it shows symbol information for the first dereference too whereas dpp
command shows symbol name only for the second dereference
Another less “elegant” variation without $p pseudo-register uses poi operator
but we need a catch block to prevent the script termination on invalid memory access:
Memory access error at ') '
We can also use !list extension but more formatting is necessary:
Trang 10Cannot read next element at 458df033
Cannot read next element at 33f4458b
The advantage of !list is in unlimited number of pointer dereferences until invalid
address is reached
Trang 11FINDING A NEEDLE IN A HAY
There is a good WinDbg command to list unique threads in a process Some
processes have so many threads that it is difficult to find anomalies in the output of
~*kv command especially when most threads are similar like waiting for LPC reply In
this case we can use !uniqstack command to list only threads with unique call stacks and
then list duplicate thread numbers
0:046> !uniqstack
Processing 51 threads, please wait
0 Id: 1d50.1dc0 Suspend: 1 Teb: 7fffe000 Unfrozen
Priority: 0 Priority class: 32
26 Id: 1d50.44ec Suspend: 1 Teb: 7ffaf000 Unfrozen
Priority: 1 Priority class: 32
Trang 13GUESSING STACK TRACE
Sometimes instead of looking at raw stack data to identify all modules that might
have been involved in a problem thread we can use the following old Windows 2000
kdex2×86 WinDbg extension command that can even work with Windows 2003 or XP
kernel memory dumps:
4: kd> !w2kfre\kdex2x86.stack -?
!stack - Do stack trace for specified thread
Usage : !stack [-?ha[0|1]] [address]
Arguments :
-?,-h - display help information
-a - specifies display mode This option is off, in default If this
option is specified, output stack trace in detail
-0,-1 - specifies filter level for display Default filter level is 0 In
level 0, display stackframes that are guessed return-adresses for reason
of its value and previous mnemonic In level 1, display stackframes that
call other stackframe or is called by other stackframe, besides level 0
address - specifies thread address When address is omitted, do stack
trace for the current thread
For example:
Loading Dump File [MEMORY.DMP]
Kernel Summary Dump File: Only kernel address space is available
Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (8 procs) Free
x86 compatible
Product: Server, suite: Enterprise TerminalServer
Built by: 3790.srv03_sp2_gdr.070304-2240
Kernel base = 0x80800000 PsLoadedModuleList = 0x808a6ea8
Debug session time: Mon Jun 11 14:49:21.541 2007 (GMT+1)
System Uptime: 0 days 2:10:11.877
4: kd> k
ChildEBP RetAddr
b7a24e84 80949b48 nt!KeBugCheckEx+0x1b
b7a24ea0 80949ba4 nt!PspUnhandledExceptionInSystemThread+0x1a
b7a25ddc 8088e062 nt!PspSystemThreadStartup+0x56
00000000 00000000 nt!KiThreadStartup+0x16
4: kd> !w2kfre\kdex2x86.stack
T Address RetAddr Called Procedure
*2 B7A24E68 80827C63 nt!KeBugCheck2(0000007E, C0000005, BFE5FEEA, );
*2 B7A24E88 80949B48 nt!KeBugCheckEx(0000007E, C0000005, BFE5FEEA, );
*2 B7A24EA4 80949BA4 nt!PspUnhandledExceptionInSystemThread(B7A24EC8,
80881801, B7A24ED0, );
*0 B7A24EAC 80881801 dword ptr EAX(B7A24ED0, 00000000, B7A24ED0, );
*1 B7A24ED4 8088ED4E dword ptr ECX(B7A25378, B7A25DCC, B7A25074, );
*1 B7A24EF8 8088ED20 nt!ExecuteHandler2(B7A25378, B7A25DCC, B7A25074, );
Trang 14*1 B7A24F1C 80877C0C nt!RtlpExecuteHandlerForException(B7A25378, B7A25DCC,
B7A25074, );
*0 B7A24F5C 808914F7 nt!RtlClearBits(893E3BF8, 0000014A, 00000001, );
*1 B7A24FA8 8082D58F nt!RtlDispatchException(B7A25378, B7A25074,
*1 B7A25188 F764C1C7 dword ptr EAX(88F4F990, B7A251EC, 88DFB000, );
*1 B7A251A4 F764C20E termdd!_IcaCallSd(88F876A0, 00000002, B7A251EC, );
*0 B7A25200 8082196C nt!_SEH_epilog(894E8648, 898B0020, 80A5A530, );
*0 B7A25248 8082196C nt!_SEH_epilog(8082DFC3, 894E8648, B7A25294, );
*1 B7A2524C 8082DFC3 dword ptr [EBP-14](894E8648, B7A25294, B7A25288, );
*1 B7A2529C 80A5C199 nt!KiDeliverApc(00000000, 00000000, 00000000, );
*1 B7A252BC 80A5C3D9 hal!HalpDispatchSoftwareInterrupt(898B0001, 00000000,
00000000, );
*1 B7A252D8 80A5C456 hal!HalpCheckForSoftwareInterrupt(00000001, 898B0000,
B7A25300, );
*1 B7A252E8 8083129E hal!KfLowerIrql(898B0020, 894E8648, 89468504, );
*1 B7A25304 8082AB7B nt!KiExitDispatcher(894E8648, 894E8608,
Trang 15*1 B7A254CC BFE6FCB6 component+0002EBE0(BFEBC0A0, BFEBC038, BFEBBF80, );
*1 B7A255C8 80A5C456 hal!HalpCheckForSoftwareInterrupt(00000000, 8CE03500,
*0 B7A2565C F7174943 Ntfs!_SEH_epilog(00000000, B7A257A0, 88F103D8, );
*1 B7A2568C 80A5C456 hal!HalpCheckForSoftwareInterrupt(00000000, 00000001,
*1 B7A25714 8081E1E9 nt!KeSetEvent(00A25C90, 00000001, 00000000, );
*1 B7A2573C F7133177 Ntfs!NtfsCleanupIrpContext(B7A25750, B7A257A4,
*0 B7A2598C 808347E4 nt!ProbeForWrite(0032FD14, 000002E4, 808348C6, );
*0 B7A25998 808348C6 nt!_SEH_epilog(7FFDA000, 894CA9C8, 00000000, );
*0 B7A259A8 F713435F Ntfs!ExFreeToNPagedLookasideList(F7150420, 88F93EF8,
B7A25ACC, );
Trang 16*0 B7A259D8 8082CBCF nt!KiEspFromTrapFrame(C0001978, 83F251EC,
*0 B7A25B38 8082196C nt!_SEH_epilog(8C22B848, 898B0020, 80A5A530, );
*0 B7A25B4C 8081C3DA nt!RtlpInterlockedPushEntrySList(00000000, 00000000,
8C22B808, );
*0 B7A25B80 8082196C nt!_SEH_epilog(8082DFC3, 8C22B848, B7A25BCC, );
*1 B7A25B84 8082DFC3 dword ptr [EBP-14](8C22B848, B7A25BCC, B7A25BC0, );
*1 B7A25BD4 80A5C199 nt!KiDeliverApc(00000000, 00000000, 00000000, );
*1 B7A25BF4 80A5C3D9 hal!HalpDispatchSoftwareInterrupt(898B0001, 00000000,
00000000, );
*1 B7A25C10 80A5C456 hal!HalpCheckForSoftwareInterrupt(00000001, 898B0000,
B7A25C38, );
*1 B7A25C20 8083129E hal!KfLowerIrql(898B0020, 8C22B848, 00000010, );
*1 B7A25C54 80A5C456 hal!HalpCheckForSoftwareInterrupt(F7757000, 00000002,
893F8BB0, );
*1 B7A25C64 8088DBAC hal!KfLowerIrql(B7A25C88, BFE6BA78, 00000000, );
*1 B7A25C78 80A5C1AE nt!KiDispatchInterrupt(B7A25CC0, B7A25D00,
Trang 17*1 B7A25D18 80A5A56D hal!KfLowerIrql(00000001, BC14A018, BC5F9003, );
*1 B7A25D44 BFE708D4 component+000312D0(BFEBBF80, 00000000, 00000000, );
Another thread stack example:
4: kd> ~1
1: kd> k
ChildEBP RetAddr
f37fe9b4 f57e8407 tcpip!_IPTransmit+0x172c
f37fea24 f57e861a tcpip!TCPSend+0x604
f37fea54 f57e6edd tcpip!TdiSend+0x242
f37fea90 f57e1d13 tcpip!TCPSendData+0xbf
T Address RetAddr Called Procedure
*1 F37FE8D4 80A5C456 hal!HalpCheckForSoftwareInterrupt(00000000, 00000000,
*2 F37FEA28 F57E861A tcpip!TCPSend(8AF6F701, 7FEA6000, 001673CE, );
*2 F37FEA58 F57E6EDD tcpip!TdiSend(00000000, 00000000, 00000B55, );
*0 F37FEA88 F5722126 dword ptr [ESI+28](F58203C0, F37FEAAC, F57E1D13, );
*2 F37FEA94 F57E1D13 tcpip!TCPSendData(88FEE99C, 00EE5FA0, 88EE5EB0, );
*1 F37FEB78 8083129E hal!KfLowerIrql(88F24A58, 00000000, 8908AE01, );
*1 F37FEB94 8082B96B nt!KiExitDispatcher(00000000, 8908AE30,
00000000, );
*0 F37FEBF4 8082196C nt!_SEH_epilog(8082DFC3, 8908AE18, F37FEC40, );
*0 F37FEBF8 8082DFC3 dword ptr [EBP-14](8908AE18, F37FEC40, F37FEC34, );
*0 F37FEC2C 8098AA4A nt!ExpLookupHandleTableEntry(E18D5E38, 00000B55,
89315008, );
Trang 18*2 F37FEC60 808F5E2F afd!AfdFastIoDeviceControl+000003A3(89435340,
This command is called “heuristic stack walker” in OSR NT Insider article
men-tioned in the post about Stack Overflow pattern (page 314) in kernel space
Trang 19COPING WITH MISSING SYMBOLIC INFORMATION
Sometimes there is no public PDB file available for a module in a crash dump
al-though we know they exist for different versions of the same module The typical
exam-ple is when we have a private PDB file loaded automatically and we need access to
structure definitions, for example, _TEB or _PEB In this case we need to force WinDbg
to load an additional PDB file just to be able to use these structure definitions This can
be achieved by loading an additional module at a different address and forcing it to use
another public PDB file At the same time we want to keep the original module to
refer-ence the correct PDB file albeit the private one Let’s look at one concrete example
For example, we are trying to get stack limits for a thread by using !teb command
but we get an error:
0:000> !teb
TEB at 7efdd000
*** Your debugger is not using the correct symbols
***
*** In order for this command to work properly, your symbol path
*** must point to pdb files that have full type information
***
*** Certain pdb files (such as the public OS symbols) do not
*** contain the required information Contact the group that
*** provided you with these symbols if you need this command to
lm command shows that the symbol file was loaded and it was correct
so perhaps it was the private symbol file or _TEB definition that was missing in it:
0:000> lm m ntdll
start end module name
7d600000 7d6f0000 ntdll (pdb symbols) c:\websymbols\wntdll.pdb\
40B574C84D5C42708465A7E4A1E4D7CC2\wntdll.pdb
The size of wntdll.pdb is 1,091Kb The search for other ntdll.pdb files finds one
with the bigger size 1,187Kb and we can append it to our symbol search path:
Trang 20Then we can look at our symbol cache folder for ntdll.dll, choose a path to a
ran-dom one and load it at the address not occupied by other modules forcing to load
sym-bol files and ignore a mismatch if any:
The additional ntdll.dll is now loaded at 7e000000 address and its module name
is ntdll_7e000000 Because we know TEB address we can see the values of _TEB
struc-ture fields immediately (the output is shown in smaller font for better visual clarity):
Trang 21+0×1e0 AppCompatFlagsUser : _ULARGE_INTEGER 0×0
+0×1e8 pShimData : (null)
+0×1ec AppCompatInfo : (null)
+0×1f0 CSDVersion : _UNICODE_STRING ―Service Pack 2″
Trang 22+0xbe0 glSectionInfo : (null)
+0xbe4 glSection : (null)
+0xbe8 glTable : (null)
+0xbec glCurrentRC : (null)
+0xfa0 NlsCache : (null)
+0xfa4 pShimData : (null)
+0xfa8 HeapVirtualAffinity : 0
+0xfac CurrentTransactionHandle : (null)
Trang 23+0xfb8 SafeThunkCall : 0 ‖
+0xfb9 BooleanSpare : [3] ―‖
Because StackBase and StackLimit are the second and the third double words we
could have just dumped the first 3 double words at TEB address:
0:000> dd 7efdd000 l3
7efdd000 0012fec0 00130000 0011c000
Trang 24RESOLVING SYMBOL MESSAGES
On one of my debugging workstations I couldn’t analyze kernel and complete
memory dumps from Windows 2003 Server R02 I was always getting this message:
*** ERROR: Symbol file could not be found Defaulted to export symbols
for ntkrnlmp.exe
-An attempt to reload and overwrite PDB files using reload /o /f command
didn’t resolve the issue but the following WinDbg command helped in troubleshooting:
1: kd> !sym noisy
noisy mode - symbol prompts on
Reloading symbol files showed that the default symbol path contained corrupt
DBGHELP: ntkrnlmp.pdb - file not found
*** ERROR: Symbol file could not be found Defaulted to export symbols
for ntkrnlmp.exe
-DBGHELP: nt - export symbol
Deleting it and reloading symbols again showed problems with the file
down-loaded from MS symbol server too (seems it was left unpacked):
1: kd> reload
SYMSRV: c:\symdownstream\ntkrnlmp.pdb\A91CA63E49A840F4A50509F90ADE10D52\n
tkrnlmp.pd_
The file or directory is corrupted and unreadable
DBGHELP: ntkrnlmp.pdb - file not found
*** ERROR: Symbol file could not be found Defaulted to export symbols
for ntkrnlmp.exe
-DBGHELP: nt - export symbols
Removing the folder and reloading symbols finally resolved the problem:
1: kd> reload
DBGHELP: nt - public symbols
c:\symdownstream\ntkrnlmp.pdb\A91CA63E49A840F4A50509F90ADE10D52\n
tkrnlmp.pdb