AddToMessageLogTEXT"StartServiceCtrlDispatcher failed.";}void WINAPI service_mainDWORD dwArgc, LPTSTR *lpszArgv{ // register our service control handler: // sshStatusHandle = Register
Trang 1Windows Service Crash Dumps in Vista 661
Loading Dump File
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Fri Sep 28 16:36:38.000 2007 (GMT+1)
System Uptime: 2 days 1:42:22.810
Process Uptime: 0 days 0:00:10.000
This dump file has an exception of interest stored in it
The stored exception information can be accessed via ecxr
(13b0.d54): Access violation - code c0000005 (first/second chance not
Child-SP RetAddr Call Site
00000000`0012fab0 000007fe`fe276cee simple!service_ctrl+0x19
00000000`0012faf0 000007fe`fe2cea5d advapi32!ScDispatcherLoop+0x54c
Fault in any other service thread, for example, the one that SCM starts per every
SERVICE_TABLE_ENTRY in dispatch table results in a default postmortem debugger
sav-ing a crash dump on Windows Server 2003 x64 but not on Vista x64 or Vista x86 (32-bit):
void cdecl main(int argc, char **argv){
SERVICE_TABLE_ENTRY dispatchTable[] = {
{ TEXT(SZSERVICENAME), (LPSERVICE_MAIN_FUNCTION)service_main}, { NULL, NULL}
};
Trang 2
AddToMessageLog(TEXT("StartServiceCtrlDispatcher failed."));
}void WINAPI service_main(DWORD dwArgc, LPTSTR *lpszArgv){
// register our service control handler:
//
sshStatusHandle = RegisterServiceCtrlHandler(
TEXT(SZSERVICENAME), service_ctrl);
if (!sshStatusHandle) goto cleanup;
// SERVICE_STATUS members that don't change in example //
*(int *)NULL = 0;
… … …}
Seems the only way to get a crash minidump for analysis is to copy it from the
re-port data like I explained above:
Loading Dump File
Executable search path is:
Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Fri Sep 28 17:50:06.000 2007 (GMT+1)
System Uptime: 0 days 0:30:59.495
Process Uptime: 0 days 0:00:04.000
This dump file has an exception of interest stored in it
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 3Windows Service Crash Dumps in Vista 663
The stored exception information can be accessed via ecxr
(d6c.fcc): Access violation - code c0000005 (first/second chance not
0 Id: d6c.cf4 Suspend: 0 Teb: 000007ff`fffdd000 Unfrozen
Child-SP RetAddr Call Site
00000000`0012f978 00000000`777026da ntdll!NtReadFile+0xa
00000000`0012f980 000007fe`feb265aa kernel32!ReadFile+0x8a
00000000`0012fa10 000007fe`feb262e3 advapi32!ScGetPipeInput+0x4a
00000000`0012faf0 000007fe`feb7ea5d advapi32!ScDispatcherLoop+0x9a
00000000`0012fbf0 00000000`004019f5
advapi32!StartServiceCtrlDispatcherA+0x8d
00000000`0012fe70 00000000`00408bac simple!main+0x155
00000000`0012fec0 00000000`0040897e simple! tmainCRTStartup+0x21c
00000000`0012ff30 00000000`7770cdcd simple!mainCRTStartup+0xe
00000000`0012ff60 00000000`7792c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
# 1 Id: d6c.fcc Suspend: 0 Teb: 000007ff`fffdb000 Unfrozen
Child-SP RetAddr Call Site
00000000`008eff00 000007fe`feb24bf5 simple!service_main+0x60
00000000`008eff30 00000000`7770cdcd advapi32!ScSvcctrlThreadW+0x25
00000000`008eff60 00000000`7792c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`008eff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
Spawning a custom thread with NULL pointer access violation doesn’t result in a
crash dump on my Vista x86 and x64 too Therefore it appears that there are no
auto-matic postmortem crash dumps saved for native Window services in Vista unless there
is some setting that I missed This might create some problems for traditional 3rd party
Technical Support procedures However it appears that there is a possible solution with
Vista SP1 and Windows Server 2003 (page 606)
Trang 4THE ROAD TO KERNEL SPACE
If you are developing and debugging user space applications (and/or doing crash
dump analysis in user space) and you want to understand Windows kernel dumps and
device drivers better (and probably start writing your own kernel tools) here is the
read-ing list I found the most effective:
0 Read and re-read Windows Internals book in parallel while reading all other
books It shows you the big picture and some useful WinDbg commands and techniques
but you need to read device driver books to fill the gaps and be confident in kernel
space
1 Start with “The Windows 2000 Device Driver Book: A Guide for Programmers
(2nd Edition)” This short book shows the basics and you can start writing drivers and
kernel tools immediately
2 Next read “Windows NT Device Driver Development” book to consolidate your
knowledge
3 Don’t stop here Read “Developing Windows NT Device Drivers:
A Programmer’s Handbook” This very good book explains everything in great detail and
good pictures
4 Continue with WDM drivers and modern presentation: “Programming the
Microsoft Windows Driver Model, Second Edition” Must read even if your drivers are
not WDM
5 Finally read “Developing Drivers with the Windows Driver Foundation” book as
this is the future and it also covers ETW (event tracing for Windows), WinDbg
exten-sions, PREfast and static driver verifier
Additional reading (not including DDK Help which you will use anyway) can be
done in parallel after finishing “Windows NT Device Driver Development” book:
1 OSR NT Insider articles: http://www.osronline.com
2 “Windows NT File System Internals”
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 5The Road to Kernel Space 665
3 “Rootkits: Subverting the Windows Kernel” book shows Windows kernel from
a hacker perspective
Trang 6MEMORY DUMP ANALYSIS INTERVIEW QUESTIONS
The following interview questions might be useful to assess the skill level in crash
dump analysis on Windows platforms These could be useful for debugging interviews as
well
1 What is FPO?
2 How many exceptions can be found in a crash dump?
3 You see the following message from WinDbg:
WARNING: Stack unwind information not available Following frames may be wrong
What would you do?
4 How would you find spinlock implementation if you have a kernel dump?
5 What is OMAP?
6 What is full page heap?
7 Company name is missing from module information How would you try to find
it?
8 What is IDT?
9 How does a postmortem debugger work?
10 You’ve got a mini dump of your application How would you disassemble the
code?
11 Memory consumption is growing for an application How would you discover
the leaking component?
12 What is IRQL?
13 When do you use TEB?
14 You’ve got 200 process dumps from a server You need to find a deadlock
How would you do it?
15 You’ve got a complete memory dump from a server You need to find a
dead-lock How would you do it?
16 What is GC heap?
17 Your customer is reluctant to send a dump file due to security policies What is
your next step?
18 What is a first chance exception?
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 7Music for Debugging 667
MUSIC FOR DEBUGGING
Debugging and understanding multithreaded programs is hard and sometimes it
requires running several execution paths mentally Here listening to composers who use
multithreading in music can help My favorite is J.S Bach Virtuoso and heroic music
helps me in live debugging too and here my favorites are Chopin, Liszt and Beethoven
Many software engineers listen to music when writing code and I’m not the
exception However, I have found that not all music suitable for programming helps me
during debugging sessions
Music for relaxation, quiet classical or modern music helps me to think about
program design and write solid code Music with several melodies played
simulta-neously, heroic and virtuoso works help me to achieve breakthrough and find a bug The
latter kind of music also suits me for listening when doing crash dump analysis or
prob-lem troubleshooting
Trang 8PDBFINDER
Version 3.5 uses the new binary database format and achieves the following
results compare to the previous version 3.0.1:
2 times smaller database size
5 times faster database load time on startup!
It is fully backwards compatible with 3.0.1 and 2.x database formats and silently
converts your old database to the new format on the first load
Additionally the new version fixes the bug in version 3.0.1 sometimes manifested
when removing and then adding folders before building the new database which
resulted in incorrectly built database
The next version 4.0 is currently under development and it will have the
following features:
The ability to open multiple databases
The ability to exclude certain folders during build to avoid excessive search
results output Fully configurable OS and language search options (which are currently
disabled for public version)
PDBFinder upgrade is available for download from Citrix support:
http://support.citrix.com/article/CTX110629
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 9When a Process Dies Silently 669
WHEN A PROCESS DIES SILENTLY
There are cases when a default postmortem debugger doesn’t save a dump file
This is because the default postmortem debugger is called from the crashed application
thread on Windows prior to Vista and if a thread stack is exhausted or critical thread
data is corrupt there is no user dump On Vista the default postmorten debugger is
called from WER (Windows Error Reporting) process WerFault.exe so there is a chance
that it can save a user dump During my experiments today on Windows 2003 (x64) I
found that if we have a stack overflow inside a 64-bit process then the process silently
dies This doesn’t happen for 32-bit processes on the same server or on a native 32-bit
OS Here is the added code from the modified default Win32 API project created in
void SoFunction(){
if (++dwSupressOptimization) {
SoFunction();
WndProc(0,0,0,0);
}}
Adding WndProc call to SoFunction is done to eliminate an optimization in
Re-lease build when a recursion call is transformed into a loop:
Trang 10Therefore without WndProc added or more complicated SoFunction there is no
stack overflow but a loop with 4294967295 (0xFFFFFFFF) iterations
If we compile an x64 project with WndProc call included in SoFunction and run it
we would never get a dump file from any default postmortem debugger although
TestDefaultDebugger64 (page 641) tool crashes with a dump We can also observe a
strange behavior that the application disappears only during the second window repaint
although it shall crash immediately when we launch it and the main window is shown
What we see is when we launch the application it is running and the main window is
visible When we force it to repaint by minimizing and then maximizing, for example,
only then it disappears from the screen and the process list If we launch 64-bit WinDbg,
load and run our application we would hit the first chance exception:
0:000> g
(159c.fc4): Stack overflow - code c00000fd (first chance)
First chance exceptions are reported before any exception handling
This exception may be expected and handled
Trang 11When a Process Dies Silently 671
00000001`4000130e mov qword ptr [rsp+20h],rax
00000001`40001313 add dword ptr [StackOverflow!dwSupressOptimization
Trang 1200000001`4000133f call StackOverflow! security_check_cookie
(00000001`40001360)
00000001`40001344 add rsp,38h
00000001`40001348 ret
However this guard page is not the last stack page as can be seen from TEB and
the current RSP address (0×33fe0):
If we continue execution and force the main application window to invalidate
(re-paint) itself we get another first chance exception instead of second chance:
0:000> g
(159c.fc4): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling
This exception may be expected and handled
Trang 13When a Process Dies Silently 673
Therefore we expect the second chance exception at the same address here and
we get it indeed when we continue execution:
0:000> g
(159c.fc4): Access violation - code c0000005 (!!! second chance !!!)
StackOverflow!SoFunction+0x22:
00000001`40001322 call StackOverflow!SoFunction (00000001`40001300)
Now we see why the process died silently There was no stack space left for
exception dispatch handler functions and therefore for the default unhandled exception
filter that launches the default postmortem debugger to save a process dump So it
looks like on x64 Windows when our process had first chance stack overflow exception
there was no second chance exception afterwards and after handling the first chance
stack overflow exception process execution resumed and finally hit its thread stack limit
This doesn’t happen with 32-bit processes even on x64 Windows where unhandled first
chance stack overflow exception results in immediate second chance stack overflow
exception at the same stack address and therefore there is a sufficient room for the
lo-cal variables for exception handler and filter functions
This is an example of what happened before exception handling changes in Vista
Trang 14ASLR: ADDRESS SPACE LAYOUT RANDOMIZATION
Vista has the new ASLR feature:
Load address randomization (/dynamicbase linker option)
Stack address randomization (/dynamicbase linker option)
Heap randomization
The first randomization changes addresses across Vista reboots The second
randomization happens every time we launch an application linked with /dynamicbase
option The third randomization happens every time we launch an application linked
with or without /dynamicbase option as we will see below
Let’s check ASLR feature by attaching WinDbg to calc, notepad and pre-Vista
application TestDefaultDebugger (page 641) Obviously native Vista applications use
ASLR
Comparison between two calc.exe processes inspected separately before and
af-ter reboot shows that the main module and system dlls have different load addresses:
Trang 15ASLR: Address Space Layout Randomization 675
Main module address has different third byte across reboots I believe that 0×00
is not allowed otherwise we would have 0×00000000 load address Therefore we have
255 unique load addresses chosen randomly
Stack addresses are different:
0007fbe4 75e4199a ntdll!KiFastSystemCallRet
0007fbe8 75e419cd USER32!NtUserGetMessage+0xc
Trang 16If we look inside TEB we would see that pointers to exception handler list are
different and stack bases are different too:
Trang 17ASLR: Address Space Layout Randomization 677
However if we look at old applications that weren’t linked with /dynamicbase
op-tion we would see that the main module and old dll base addresses are the same:
0:000> lm
start end module name
00400000 00435000 TestDefaultDebugger
20000000 2000d000 LvHook
To summarize different alternatives I created the following table where
“New” column - processes linked with /dynamicbase option, no reboot
“New/Reboot” column - processes linked with /dynamicbase option, reboot
“Old” column - old processes, no reboot
“Old/Reboot” column - old processes, reboot