In the section called Core dumps, we will see what a core dump is, and how it can help you in debugging your code.. • What type of bug was it see the section called Types of bugs?. So us
Trang 1Debugging C and C ++ code in a Unix
environment
J.H.M Dassen jdassen@wi.LeidenUniv.nl
I.G Sprinkhuizen-Kuyper kuyper@wi.LeidenUniv.nl
Trang 2Debugging C and C code in a Unix environment
by J.H.M Dassen and I.G Sprinkhuizen-Kuyper
Copyright © 1998-1999 by J.H.M Dassen (Ray) and I.G Sprinkhuizen-Kuyper
Copyright and Permission Notice
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved
Trang 3Table of Contents
Abstract 5
1 Introduction 6
2 Conventions 7
3 Aspects of debugging C and C ++ code 8
Noticing and localising a bug 8
Understanding a bug 8
Repairing a bug 8
Types of bugs 9
C and C++specific problems 10
Preprocessor 10
Strong systems dependency 10
Weak type system 11
Explicit storage allocation and deallocation 11
Name space pollution 11
Incremental building/linking 12
The build process 12
Core dumps 13
Debugging techniques 13
Using the compiler’s features 13
The RTFM technique 14
printf() debugging 15
Assertions: defensive programming 17
ANWB debugging 18
Code grinding (code walk through) 18
Tools 18
The editor 18
A version management system 18
The debugger 19
Memory allocation debugging tools 21
System call tracers 21
Profilers 22
Conclusions 22
Bibliography 23
A .25
An example makefile 25
Documentation formats 27
Manual pages 27
Trang 4Info documentation 28 HTML and PDF 28 Flat ASCII, DVI, PostScript etc .28
Trang 5This document describes several techniques and tools for debugging code in C-like languages in a Unixenvironment
Trang 6Chapter 1 Introduction
Debugging is the art of removing bugs from software The software may be code, documentation, or any
other intellectual product Here, we will look at the debugging of computer programs (or libraries)written in C or C++in a Unix environment Most of it is also applicable to other compiled procedural andobject oriented languages like Pascal, Modula and Objective C
We will mostly focus on techniques and tools to assist in debugging Of course, it is better to preventbugs from slipping into your code in the first place Sometimes it is difficult to distinguish between goodcoding practices and good debugging practices, because good debugging practices often involve
preparation and prevention So, we will also discuss some good coding practices that you should
consider adopting These practices will not make your programs bug-free, but they will diminish theoccurrence of certain types of bugs, while preparing you better for dealing with the remaining ones
It is our experience that many people waste large amounts of time on localising bugs that are quite easy
to fix once they are found, because they are not aware of, or do not know how to use, the tools,
techniques and practices available to them
Our goal is to help you prevent wasting your time in this fashion We hope you will invest time to studythe material covered here; we are convinced this investment will pay off
Trang 7Chapter 2 Conventions
This paper follows some Unix conventions: commands and names of manual pages are written like this;
for manual pages like this: ls(1), where the section is indicated in parentheses Also, some of the
terminology (‘foo’, ‘bar’, ‘RTFM’) comes from Unix hackerdom; see [JARGON] if you are interested init
Trang 8Chapter 3 Aspects of debugging C and C ++
code
Debugging C and C++code entails noticing, localising, understanding and repairing bugs
Noticing and localising a bug
You might think that noticing a bug is easy: you know what your code should do, and you notice that itdoes not do that This easiness is deceptive Noticing a bug involves testing Testing is best done in adisciplined fashion, and, wherever possible, in an automated fashion1 For certain types of programs(e.g compilers) it is relatively easy to construct tests (input + expected output/result) and to run theseautomatically — say, after each build
You should prepare tests carefully Make sure that if a test fails, you can see what goes wrong
In a Unix system, a bug often manifests itself as a program crash, leaving a core dump In the section
called Core dumps, we will see what a core dump is, and how it can help you in debugging your code.
Understanding a bug
You should make sure that you understand a bug fully before you attempt to fix it Ask yourself thefollowing questions:
• Have I really found the cause of the problem I observed, or is this a mere symptom?
• Have I made similar mistakes (especially wrong assumptions) elsewhere in the code?
• Is this cause just a programming error, or is there a more fundamental problem (e.g the algorithm isincorrect)?
Repairing a bug
Repairing a bug is more than modifying code Make sure you document your fix properly in the code,
and test it properly
After repairing a bug, ask yourself what you can learn from it:
Trang 9Chapter 3 Aspects of debugging C and C code
• How did I notice this bug? This might help you to write a test case to detect it if it slips in again
• How did I track it down? This will give you better insight in which approach to take in case youencounter similar symptoms again
• What type of bug was it (see the section called Types of bugs)? Do I encounter this type often? If so,
what can I do to prevent it from re-occurring?
What you learn is probably valuable not only to you in developing this particular piece of code Try tocommunicate what you learned to your colleagues, for instance by writing it down in a pattern-likefashion (e.g ‘IF you find your program foos bars AND it does not foo bazs THEN try frobbing it’).Quite often, we find that one of the main reasons why tracking down a bug takes so long, is that we havemade unjustified assumptions about parts of our code2
• Build errors Some errors can result from using object files that haven’t been rebuilt after a changethat affects them Make sure you use a Makefile, and that it accurately reflects the dependencies
involved in building your project See the section called An example makefile in Appendix A for a way
to track dependencies automatically
• Basic semantic bugs, such as using uninitialised variables, dead code3and certain type problems Acompiler can often bring these to your attention, but it must be told to do so explicitly (e.g throughwarning and optimisation flags4; see the section called Using the compiler’s features).
• Semantic bugs, such as using the wrong variable or using ‘&’ ’&&’ No compiler or other tool canfind these You’ll have to do some thinking here Testing your program step by step using a debuggingtool can help you here
Trang 10Chapter 3 Aspects of debugging C and C code
Note that there are many ways of classification, most of which are orthogonal to each other For example,hackers tend to distinguish between Bohr bugs and Heisenbugs ([JARGON]) Bohr bugs are ‘reliable’bugs: given a particular input, they will always manifest themselves Heisenbugs are bugs that aredifficult to reproduce reliably; they appear to depend on the phase of the moon (environmental factorslike time, particular memory allocation etc.) A Heisenbug is very often the result of errors in pointers:
using memory that is not allocated So use tools (Electric Fence, see the section called Memory
allocation debugging tools) to check all pointers and array boundaries (Another cause is the use of
uninitialised variables)
There are some features of the C and C++languages and the associated build process that often lead toproblems
Preprocessor
C and C++use a preprocessor to expand macro’s, declare dependencies and import declarations and to doconditional compilation In itself, this is quite reasonable You should realise however that all of these aredone on a textual level The C/C++preprocessor does not
This can make it difficult to track down missing declarations, it can lead to semantic problems because ofmacro expansion and it can cause subtle problems
If you suspect a problem due to preprocessing, check out the preprocessor’s manual (e.g [CPP]) and let
it expand your file for examination
Strong systems dependency
C was developed for use as a systems programming language C and also C++can give you access to a lot
of operating system functionality Unfortunately, there are a lot of small but significant differencesamong various Unix systems:
• Some system calls are not available on all systems
• Some system calls and library functions are defined in different header files on different systems
• There may be differing semantics for particular routines For example, on Sys V-like systems, a signalhandler reinstalled On BSD-like systems, a signal handler stays in place until explicitly removed
Trang 11Chapter 3 Aspects of debugging C and C code
Also, the size and representation of some of C’s and C++’s basic types is dependent on the underlyingsystem As a C or C++programmer, you should be aware of what things are explicitly undefined in the C
or C++standard, and thus are implementation (system or compiler) dependent There are standard ways toovercome some of these problems, like usingsizeofinstead of the concrete size of the variable on thecurrent system
Weak type system
C and C++have a type system, but it is very weak You can do all kinds of conversions, many of whichcan be system dependent or meaningless Also, the compiler can do some implicit conversions that maycause havoc
Most errors due to the weak type system can be caught in the bud by doing static analysis early; see the
section called Using the compiler’s features.
Explicit storage allocation and deallocation
In C and C++, you have to explicitly allocate and deallocate dynamic storage throughmallocandfree
(for C) and throughnewanddelete(for C++) If memory (de)allocation is done incorrectly, it can causeproblems at run time such as memory corruption and memory leaks (the memory use of a program keeps
on increasing during execution)
Common errors are:
• Trying to use memory that has not been allocated yet
• Trying to access memory that has been deallocated already
• Deallocating memory twice
These errors are difficult to correct without using proper tools; see the section called Memory allocation
debugging tools.
Name space pollution
In C and C++programmers commonly do not to try to prevent name space pollution (name conflicts)
• Use thestatickeyword to indicate functions and variables whose scope is restricted to the currentfile
Trang 12Chapter 3 Aspects of debugging C and C code
• Use as few global variables and functions as necessary If you have to use a large number of them,prefix their names consistently (e.g.MYPROJECT_someglobal)
Incremental building/linking
C and C++code can be built incrementally; usually make is used to specify dependencies among files for
a build If a Makefile does not specify dependencies properly, you can end up with executables linked toold versions of modules which can be buggy or incompatible with recently introduced changes in othermodules
The build process
Bugs you encounter may not be due to your C or C++code; they might be the result of how your
executable/library was built Make sure that you understand how the build process is organised
You should use a Makefile A Makefile describes how to build your project: it lists the files involved inyour project, their interdependencies and how a tool should build intermediary files and the end product.Make sure you have listed all dependencies; missing even a single dependency can lead to subtle
problems
make is a powerful tool, and it pays off to acquaint yourself with it well For instance, in general you
should not list compilation lines directly GNU make has some builtin rules (so called implicit rules) on
how, say,.ofiles are built from.cfiles To use those rules, you only specify the dependencies (e.g
foo.o: foo.c foo.h bar.h(for C orfoo.ccfor C++programs)), and no build rule The implicitrules have a number of variables that you can set (e.g.CCfor the C or C++compiler,CFLAGSfor thecompilation flags,LOADLIBESfor the libraries) Using the implicit rules makes your makefiles shorter,
easier to read and easier to modify See [MAKETUT], [MAKETUT2] and [MAKE] for details on make The GNU make documentation [MAKE] contains a list of the implicit rules it supports, and the variables
Trang 13Chapter 3 Aspects of debugging C and C code
The linker combines a number of object files and libraries to produce an executable or library If this
executable or library needs no external libraries, it is called statically linked; otherwise it is called
A core dump is a snapshot of the execution of a program at the moment it is aborted by the operating
system (e.g for attempting to violate the memory protection) A normal core dump is not very helpful
unless you are an expert In the section called The debugger, we will see how to make core dumps more
helpful for debugging
By default, core dumps do not contain all the information you’d like them to For example, a core dumpcan tell you that you where dereferencing a pointer at memory location 0x12345 while executing theinstruction at 0x45678 You’d probably like to see a message that means more to you (‘The program wasaborted while attempting to dereference foo, which was NULL, at bar.c line 23’) This is possible, but itrequires you to include such information in advance
Also, note that a core dump is a snapshot; it does not include the history of how your program came tothe problematic state What a core dump shows you is a manifestation of a bug; the point where aprogram dumps core is not always the location of the bug itself, which may be located 100000
instructions back in time Often, you can reconstruct the history of a run from a core dump, but this is
difficult printf debugging (see the section called printf() debugging) and possibly system call tracing (see the section called System call tracers) are useful techniques to do this Using a debugger (see the section called The debugger) is advised.
Trang 14Chapter 3 Aspects of debugging C and C code
Debugging techniques
In this section a number of debugging techniques from reading manuals to using tools are described
Using the compiler’s features
A good compiler can do a good deal of static analysis of your code: the analysis of those aspects of a
piece of code that can be studied without executing that code
Static analysis can help in detecting a number of basic semantic problems such as type mismatches anddead code
For gcc (the GNU C compiler) there are a number of options that affect what static analysis gcc does andwhat results will be shown There are two types of options:
Warning options
gcc has a great number of warning flags Most have the form-Wphrase You should pick ones
relevant to you at the start of coding and put them into your Makefile (use the implicit rules, and putthem in theCFLAGSvariable) Note that-Walldoes not switch on all warnings It enables a set of
warnings that gcc’s developers consider useful under nearly all circumstances In addition to-Wall
we recommend at least the following warnings when writing new code: -Wshadow-Wpointer-arith -Wcast-qual -Wcast-align -Wstrict-prototype 5As an example:The following code will result in a warning because the possibility exists that the function returnswithout returning a value: foo(int a) { if (a > 0) return a; }
Optimisation flags
gcc also supports a number of optimisations Some of these trigger gcc to do extensive flow analysis
of your code, resulting in for example dead code removal For normal use, we recommend-O2 Donot use higher optimisation levels unless you know what you are doing; the higher levels cancontain experimental optimisations which could generate bad code Also note that on some systems,enabling optimisation makes debugging using a debugger virtually impossible
For full documentation of these options, see the chapter ‘GNU CC Command Options’ in [GCC]
The RTFM technique
RTFM stands for Read The Fine Manual Make sure you take the time to find relevant documentation for
the task at hand, i.e the documentation of the tools (not only the compiler, but also make, the
preprocessor and the linker), libraries and algorithms you are expected to use, such as