Prinz, crawford - c in a nutshell 2006

Đây là quyển sách tiếng anh về lĩnh vực công nghệ thông tin cho sinh viên và những ai có đam mê. Quyển sách này trình về lý thuyết ,phương pháp lập trình cho ngôn ngữ C và C++.

Trang 3

IN A NUTSHELL

Trang 4

Other resources from O’Reilly

Related titles C Pocket Reference

Practical C Programming

Secure Programming

Cookbookfor C and C++

Programming Embedded

Systems with Cand C++

Programming with GNUSoftware

Objective-C PocketReferencePrefactoringPractical DevelopmentEnvironments

oreilly.com oreilly.com is more than a complete catalog of O’Reilly

books You’ll also find links to news, events, articles, logs, sample chapters, and code examples

web-oreillynet.com is the essential portal for developers

inter-ested in open and emerging technologies, including newplatforms, programming languages, and operating systems

Conferences O’Reilly brings diverse innovators together to nurture the

ideas that spark revolutionary industries We specialize indocumenting the latest tools and systems, translating theinnovator’s knowledge into useful skills for those in the

trenches Visit conferences.oreilly.com for our upcoming

events

Safari Bookshelf (safari.oreilly.com) is the premier online

reference library for programmers and IT professionals.Conduct searches across more than 1,000 books Sub-scribers can zero in on answers to time-critical questions

in a matter of seconds Read the books on your shelf from cover to cover or simply flip to the page youneed Try it today for free

Trang 5

IN A NUTSHELL

Peter Prinz and Tony Crawford

Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo

Trang 6

C in a Nutshell

by Peter Prinz and Tony Crawford

Printed in the United States of America.

Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online

editions are also available for most titles (safari.oreilly.com) For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com.

Editor: Jonathan Gennick

Production Editor: A J Fox

Cover Designer: Karen Montgomery

Interior Designer: David Futato

Printing History:

December 2005: First Edition.

Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered

trademarks of O’Reilly Media, Inc The In a Nutshell series designations, C in a Nutshell, the

image of a cow, and related trade dress are trademarks of O’Reilly Media, Inc.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks Where those designations appear in this book, and O’Reilly Media, Inc was aware of a trademark claim, the designations have been printed in caps or initial caps.

While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.

ISBN: 978-0-596-00697-6

Trang 8

vi | Table of Contents

4 Type Conversions 40

5 Expressions and Operators 55

Trang 9

Table of Contents | vii

10 Structures, Unions, and Bit-Fields 139

12 Dynamic Memory Management .167

14 Preprocessing Directives 209

Trang 10

viii | Table of Contents

Part II Standard Library

15 The Standard Headers 227

17 Standard Library Functions 271

Part III Basic Tools

18 Compiling with GCC .491

Trang 11

Table of Contents | ix

19 Using make to Build C Programs .512

Trang 13

This book is not an introduction to programming in C Although it covers thefundamentals of the language, it is not organized or written as a tutorial If you arenew to C, we assume that you have read at least one of the many introductorybooks, or that you are familiar with a related language, such as Java or C++.

How This Book Is Organized

This book is divided into three parts The first part describes the C language in thestrict sense of the term; the second part describes the standard library; and thethird part describes the process of compiling and testing programs with thepopular tools in the GNU software collection

Trang 14

xii | Preface

Part I

Part I, which deals with the C language, includes Chapters 1 through 14 AfterChapter 1, which describes the general concepts and elements of the language,each chapter is devoted to a specific topic, such as types, statements, or pointers.Although the topics are ordered so that the fundamental concepts for each newtopic have been presented in an earlier chapter—types, for example, are describedbefore expressions and operators, which come before statements, and so on—youmay sometimes need to follow references to later chapters to fill in related details.For example, some discussion of pointers and arrays is necessary in Chapter 5(which covers expressions and operators), even though pointers and arrays are notdescribed in full detail until Chapters 8 and 9

Chapter 1, Language Basics

Describes the characteristics of the language and how C programs are tured and compiled This chapter introduces basic concepts such as thetranslation unit, character sets, and identifiers

struc-Chapter 2, Types

Provides an overview of types in C and describes the basic types, the type

void, and enumerated types

Chapter 3, Literals

Describes numeric constants, character constants, and string literals,including escape sequences

Chapter 4, Type Conversions

Describes implicit and explicit type conversions, including integer promotionand the usual arithmetic conversions

Chapter 5, Expressions and Operators

Describes the evaluation of expressions, all the operators, and their ible operands

Describes the definition and use of pointers to objects and functions

Chapter 10, Structures, Unions, and Bit-Fields

Describes the organization of data in these user-defined derived types

Trang 15

Preface | xiii

Chapter 11, Declarations

Describes the general syntax of a declaration, identifier linkage, and thestorage duration of objects

Chapter 12, Dynamic Memory Management

Describes the standard library’s dynamic memory management functions,illustrating their use in a sample implementation of a generalized binary tree

Chapter 13, Input and Output

Describes the C concept of input and output, with an overview of the use ofthe standard I/O library

Chapter 14, Preprocessing Directives

Describes the definition and use of macros, conditional compiling, and all theother preprocessor directives and operators

Part II

Part II, consisting of Chapters 15, 16, and 17, is devoted to the C standard library

It provides an overview of standard headers and also contains a detailed functionreference

Chapter 15, The Standard Headers

Describes contents of the headers and their use The headers contain all ofthe standard library’s macros and type definitions

Chapter 16, Functions at a Glance

Provides an overview of the standard library functions, organized by areas ofapplication, such as “Mathematical Functions,” “Time and Date Functions,”and so on

Chapter 17, Standard Library Functions

Describes each standard library function in detail, in alphabetical order, andcontains examples to illustrate the use of each function

Part III

The third part of this book provides the necessary knowledge of the C

programmer’s basic tools: the compiler, the make utility, and the debugger The

tools described here are those in the GNU software collection

Chapter 18, Compiling with GCC

Describes the principal capabilities that the widely used compiler offers for Cprogrammers

Chapter 19, Using make to Build C Programs

Describes how to use the make program to automate the compiling process

for large programs

Chapter 20, Debugging C Programs with GDB

Describes how to run a program under the control of the GNU debugger andhow to analyze programs’ runtime behavior to find logical errors

Trang 16

xiv | Preface

Further Reading

In addition to works mentioned at appropriate points in the text, there are anumber of resources for readers who want more technical detail than even thisbook can provide The international working group on C standardization has an

official home page at http://www.open-std.org/jtc1/sc22/wg14, with links to the

latest version of the C99 standard and current projects of the working group

For readers who are interested in not only the what and how of C, but also the

why, the WG14 site also has a link to the “C99 Rationale”: this is a nonnormative

but current document that describes some of the motivations and constraints

involved in the standardization process The C89 Rationale is online at http://

www.lysator.liu.se/c/rat/title.html Furthermore, for those who may wonder how C

“got to be that way” in the first place, the originator of C, Dennis Ritchie, has anarticle titled “The Development of the C Language” as well as other historical

documents on his Bell Labs web site, http://cm.bell-labs.com/cm/cs/who/dmr.

Readers who want details on floating-point math beyond the scope of C may wish

to start with David Goldberg’s thorough introduction, “What Every ComputerScientist Should Know About Floating-Point Arithmetic,” currently available

online at http://docs.sun.com/source/806-3568/ncg_goldberg.html.

Conventions Used in This Book

The following typographical conventions are used in this book:

func-Constant width bold

Highlights the function or statement under discussion in code examples In

compiler, make, and debugger sessions, this font indicates command input to

be typed literally by the user

Constant width italic

Indicates parameters in function prototypes, or placeholders to be replacedwith your own values

Plain text

Indicates keys such as Return, Tab, and Ctrl

Trang 17

Preface | xv

This icon signifies a tip, suggestion, or general note

This icon signifies a warning or caution

Using Code Examples

This book is here to help you get your job done In general, you may use the code

in this book in your programs and documentation You do not need to contact usfor permission unless you’re reproducing a significant portion of the code Forexample, writing a program that uses several chunks of code from this book doesnot require permission Selling or distributing a CD-ROM of examples from

O’Reilly books does require permission Answering a question by citing this book

and quoting example code does not require permission Incorporating a cant amount of example code from this book into your product’s documentation

signifi-does require permission.

We appreciate, but do not require, attribution An attribution usually includes the

title, author, publisher, and ISBN For example: “C in a Nutshell by Peter Prinz

If you feel that your use of code examples falls outside fair use or the permission

given here, feel free to contact us at permissions@oreilly.com.

Safari® Enabled

When you see a Safari® Enabled icon on the cover of your favoritetechnology book, that means the book is available online throughthe O’Reilly Network Safari Bookshelf

Safari offers a solution that’s better than e-books It’s a virtuallibrary that lets you easily search thousands of top tech books, cut and paste codesamples, download chapters, and find quick answers when you need the most

accurate, current information Try it for free at http://safari.oreilly.com.

Your Questions and Comments

Please address comments and questions concerning this book to the publisher:O’Reilly Media, Inc

1005 Gravenstein Highway North

Sebastopol, CA 95472

(800) 998-9938 (in the United States or Canada)

(707) 829-0515 (international or local)

(707) 829-0104 (fax)

Trang 18

Both of us want to thank Jonathan Gennick, our editor, for originally bringing

us together and starting us off on this book, and for all his guidance along theway We also thank our technical reviewers, Matt Crawford, David Kitabjian,and Chris LaPre, for their valuable criticism of our manuscript, and we’regrateful to our production editor, Abby Fox, for all her attention to making ourbook look good

Peter

I would like to thank Tony first of all for the excellent collaboration My heartfeltthanks also go to all my friends for the understanding they showed again andagain when I had so little time for them Last but not least, I dedicate this book to

my daughters, Vivian and Jeanette—both of them now students of computerscience—who strengthened my ambition to carry out this book project

Tony

I have enjoyed working on this book as a very rewarding exercise in teamwork Ithank Peter for letting me take all the space I could fill in this project The oppor-tunity to work with my brother Matt in the review phase was particularlygratifying

Trang 19

ILanguage

Trang 21

• Source code portability

• The ability to operate “close to the machine”

• Efficiency

As a result, the developers of Unix were able to write most of the operating system

in C, leaving only a minimum of system-specific hardware manipulation to becoded in assembler

C’s ancestors are the typeless programming languages BCPL (the Basic CombinedProgramming Language), developed by Martin Richards; and B, a descendant ofBCPL, developed by Ken Thompson A new feature of C was its variety of datatypes: characters, numeric types, arrays, structures, and so on Brian Kernighanand Dennis Ritchie published an official description of the C programminglanguage in 1978 As the first de facto standard, their description is commonlyreferred to simply as “K&R.”*C owes its high degree of portability to a compact

* The second edition, revised to reflect the first ANSI C standard, is available as The C Programming

Language, 2nd ed., by Brian W Kernighan and Dennis M Ritchie (Englewood Cliffs, N.J.: Prentice

Hall, 1988).

Trang 22

4 | Chapter 1: Language Basics

core language that contains few hardware-dependent elements For example, the

C language proper has no file access or dynamic memory management ments In fact, there aren’t even any statements for console input and output.Instead, the extensive standard C library provides the functions for all of thesepurposes

state-This language design makes the C compiler relatively compact and easy to port tonew systems Furthermore, once the compiler is running on a new system, youcan compile most of the functions in the standard library with no further modifi-cation, because they are in turn written in portable C As a result, C compilers areavailable for practically every computer system

Because C was expressly designed for system programming, it is hardly surprisingthat one of its major uses today is in programming embedded systems At thesame time, however, many developers use C as a portable, structured high-levellanguage to write programs such as powerful word processor, database, andgraphics applications

The Structure of C Programs

The procedural building blocks of a C program are functions, which can invoke

one another Every function in a well-designed program serves a specific purpose

The functions contain statements for the program to execute sequentially, and statements can also be grouped to form block statements, or blocks As the

programmer, you can use the ready-made functions in the standard library, orwrite your own whenever no standard function fulfills your intended purpose Inaddition to the standard C library, there are many specialized libraries available,such as libraries of graphics functions However, by using such nonstandardlibraries, you limit the portability of your program to those systems to which thelibraries themselves have been ported

Every C program must define at least one function of its own, with the specialname main( ): this is the first function invoked when the program starts The

main( )function is the program’s top level of control, and can call other functions

as subroutines

Example 1-1 shows the structure of a simple, complete C program We will discussthe details of declarations, function calls, output streams and more elsewhere inthis book For now, we are simply concerned with the general structure of the Csource code The program in Example 1-1 defines two functions, main( ) and

circularArea( ) Themain( )function callscircularArea( )to obtain the area of acircle with a given radius, and then calls the standard library functionprintf( )tooutput the results in formatted strings on the console

Example 1-1 A simple C program

// circle.c: Calculate and print the areas of circles

#include <stdio.h> // Preprocessor directive

double circularArea( double r ); // Function declaration (prototype form)

Trang 23

The Structure of C Programs | 5

Note that the compiler requires a prior declaration of each function called The

prototype ofcircularArea( )in the third line of Example 1-1 provides the mation needed to compile a statement that calls this function The prototypes ofstandard library functions are found in standard header files Because the header

infor-file stdio.h contains the prototype of theprintf( )function, the preprocessor

direc-tive#include <stdio.h>declares the function indirectly by directing the compiler’spreprocessor to insert the contents of that file (See also the section “How the CCompiler Works,” at the end of this chapter.)

You may arrange the functions defined in a program in any order In Example 1-1,

we could just as well have placed the functioncircularArea( )before the function

main( ) If we had, then the prototype declaration of circularArea( ) would besuperfluous, because the definition of the function is also a declaration

Function definitions cannot be nested inside one another: you can define a localvariable within a function block, but not a local function

int main( ) // Definition of main( ) begins

{

double radius = 1.0, area = 0.0;

printf( " Areas of Circles\n\n" );

printf( " Radius Area\n"

" -\n" );

area = circularArea( radius );

printf( "%10.1f %10.2f\n", radius, area );

radius = 5.0;

area = circularArea( radius );

printf( "%10.1f %10.2f\n", radius, area );

return 0;

}

// The function circularArea( ) calculates the area of a circle

// Parameter: The radius of the circle

// Return value: The area of the circle

double circularArea( double r ) // Definition of circularArea( ) begins

Trang 24

Source Files

The function definitions, global declarations and preprocessing directives make

up the source code of a C program For small programs, the source code is written

in a single source file Larger C programs consist of several source files Becausethe function definitions generally depend on preprocessor directives and globaldeclarations, source files usually have the following internal structure:

consistency, you can write this information just once in a separate header file, and

then reference the header file using an#includedirective in each source code file

Header files are customarily identified by the filename suffix h A header file

explicitly included in a C source file may in turn include other files

Example 1-2 The first source file, containing the main( ) function

// circle.c: Prints the areas of circles.

// Uses circulararea.c for the math

Example 1-3 The second source file, containing the circularArea( ) function

// circulararea.c: Calculates the areas of circles.

// Called by main( ) in circle.c

double circularArea( double r )

{

/* As in Example 1-1 */

}

Trang 25

Comments | 7

Each C source file, together with all the header files included in it, makes up a

translation unit The compiler processes the contents of the translation unit

sequentially, parsing the source code into tokens, its smallest semantic units, such

as variable names and operators See the section “Tokens,” at the end of thischapter for more detail

Any number of whitespace characters can occur between two successive tokens,allowing you a great deal of freedom in formatting the source code There are norules for line breaks or indenting, and you may use spaces, tabs, and blank linesliberally to format “human-readable” source code The preprocessor directives areslightly less flexible: a preprocessor directive must always appear on a line byitself, and no characters except spaces or tabs may precede the hash mark (#) thatbegins the line

There are many different conventions and “house styles” for source code ting Most of them include the following common rules:

format-• Start a new line for each new declaration and statement

• Use indentation to reflect the nested structure of block statements

Comments

You should use comments generously in the source code to document your C

programs There are two ways to insert a comment in C: block comments begin

with/*and end with*/, and line comments begin with//and end with the nextnew line character

You can use the /*and */delimiters to begin and end comments within a line,and to enclose comments of several lines For example, in the following functionprototype, the ellipsis ( ) signifies that theopen( )function has a third, optionalparameter The comment explains the usage of the optional parameter:

int open( const char *name, int mode, /* int permissions */ );

You can use//to insert comments that fill an entire line, or to write source code

in a two-column format, with program code on the left and comments on theright:

const double pi = 3.1415926536; // Pi is constant

These line comments were officially added to the C language by the C99 dard, but most compilers already supported them even before C99 They aresometimes called “C++-style” comments, although they originated in C’s fore-runner, BCPL

stan-Inside the quotation marks that delimit a character constant or a string literal, thecharacters/* and//do not start a comment For example, the following state-ment contains no comments:

printf( "Comments in C begin with /* or //.\n" );

The only thing that the preprocessor looks for in examining the characters in acomment is the end of the comment; thus it is not possible to nest block

Trang 26

comments However, you can insert/*and*/to comment out part of a programthat contains line comments:

/* Temporarily removing two lines:

const double pi = 3.1415926536; // Pi is constant

area = pi * r * r // Calculate the area

Temporarily removed up to here */

If you want to comment out part of a program that contains block comments, youcan use a conditional preprocessor directive (described in Chapter 14):

#if 0

const double pi = 3.1415926536; /* Pi is constant */

area = pi * r * r /* Calculate the area */

#endif

The preprocessor replaces each comment with a space The character sequence

min/*max*/Value thus becomes the two tokensmin Value

Character Sets

C makes a distinction between the environment in which the compiler translates

the source files of a program—the translation environment—and the environment

in which the compiled program is executed, the execution environment ingly, C defines two character sets: the source character set is the set of characters that may be used in C source code, and the execution character set is the set of

Accord-characters that can be interpreted by the running program In many C tations, the two character sets are identical If they are not, then the compilerconverts the characters in character constants and string literals in the source codeinto the corresponding elements of the execution character set

implemen-Each of the two character sets includes both a basic character set and extended

characters The C language does not specify the extended characters, which are

usually dependent on the local language The extended characters together with

the basic character set make up the extended character set.

The basic source and execution character sets both contain the following types ofcharacters:

The letters of the Latin alphabet

The five whitespace characters

Space, horizontal tab, vertical tab, new line, and form feed

The basic execution character set also includes four nonprintable characters: the

null character, which acts as the termination mark in a character string; alert; backspace; and carriage return To represent these characters in character and

Trang 27

Character Sets | 9

string literals, type the corresponding escape sequences beginning with a

back-slash:\0for the null character,\afor alert,\bfor backspace, and\rfor carriagereturn See Chapter 3 for more details

The actual numeric values of characters—the character codes—may vary fromone C implementation to another The language itself imposes only the followingconditions:

• Each character in the basic character set must be representable in one byte

• The null character is a byte in which all bits are 0

• The value of each decimal digit after 0 is greater by one than that of the ceding digit

pre-Wide Characters and Multibyte Characters

C was originally developed in an English-speaking environment where the nant character set was the 7-bit ASCII code Since then, the 8-bit byte has becomethe most common unit of character encoding, but software for international usegenerally has to be able to represent more different characters than can be coded

domi-in one byte, and domi-internationally, a variety of multibyte character encoddomi-ingschemes have been in use for decades to represent non-Latin alphabets and thenonalphabetic Chinese, Japanese, and Korean writing systems In 1994, with theadoption of “Normative Addendum 1,” ISO C standardized two ways of repre-

senting larger character sets: wide characters, in which the same bit width is used for every character in a character set, and multibyte characters, in which a given

character can be represented by one or several bytes, and the character value of agiven byte sequence can depend on its context in a string or stream

Although C now provides abstract mechanisms to manipulate and

convert the different kinds of encoding schemes, the language itself

doesn’t define or specify any encoding scheme, or any character set

except the basic source and execution character sets described in

the previous section In other words, it is left up to individual

implementations to specify how to encode wide characters, and

what multibyte encoding schemes to support

Since the 1994 addendum, C has provided not only the typechar, but alsowchar_t,

the wide character type This type, defined in the header file stddef.h, is large enough

to represent any element of the given implementation’s extended character sets

Although the C standard does not require support for Unicode character sets,many implementations use the Unicode transformation formats UTF-16 and

UTF-32 (see http://www.unicode.org) for wide characters The Unicode standard is

largely identical with the ISO/IEC 10646 standard, and is a superset of manyperviously existing character sets, including the 7-bit ASCII code When theUnicode standard is implemented, the typewchar_tis at least 16 or 32 bits wide,and a value of typewchar_trepresents one Unicode character For example, thefollowing definition initializes the variablewc with the Greek letterα

wchar_t wc = '\x3b1';

Trang 28

The escape sequence beginning with\xindicates a character code in hexadecimalnotation to be stored in the variable—in this case, the code for a lowercase alpha

In multibyte character sets, each character is coded as a sequence of one or morebytes Both the source and execution character sets may contain multibyte charac-ters If they do, then each character in the basic character set occupies only onebyte, and no multibyte character except the null character may contain any byte inwhich all bits are 0 Multibyte characters can be used in character constants,string literals, identifiers, comments, and header filenames Many multibyte char-acter sets are designed to support a certain language, such as the JapaneseIndustrial Standard character set (JIS) The multibyte UTF-8 character set, defined

by the Unicode Consortium, is capable of representing all Unicode characters.UTF-8 uses from one to four bytes to represent a character

The key difference between multibyte characters and wide characters (that is,characters of typewchar_t) is that wide characters are all the same size, and multi-byte characters are represented by varying numbers of bytes This representationmakes multibyte strings more complicated to process than strings of wide charac-ters For example, even though the character'A' can be represented in a singlebyte, finding it in a multibyte string requires more than a simple byte-by-bytecomparison, because the same byte value in certain locations could be part of adifferent character Multibyte characters are well suited for saving text in files,however (see Chapter 13)

C provides standard functions to obtain thewchar_tvalue of any multibyte acter, and to convert any wide character to its multibyte representation Forexample, if the C compiler uses the Unicode standards UTF-16 and UTF-8, thenthe following call to the functionwctomb( )(read: “wide character to multibyte”)obtains the multibyte representation of the characterα:

char-wchar_t wc = L'\x3B1'; // Greek lower-case alpha, α

Universal Character Names

C also supports universal character names as a way to use the extended characterset regardless of the implementation’s encoding You can specify any extended

character by its universal character name, which is its Unicode value in the form:

\uXXXX

or:

\UXXXXXXXX

Trang 29

When you specify a character by its universal character name, the compiler stores

it in the character set used by the implementation For example, if the executioncharacter set in a localized program is ISO 8859-7 (8-bit Greek), then thefollowing definition initializes the variablealpha with the code\xE1:

char alpha = '\u03B1';

However, if the execution character set is UTF-16, then you need to define thevariable as a wide character:

wchar_t alpha = '\u03B1';

In this case, the character code value assigned toalphais hexadecimal 3B1, thesame as the universal character name

Not all compilers support universal character names

Digraphs and Trigraphs

C provides alternative representations for a number of punctuation marks that are

not available on all keyboards Six of these are the digraphs, or two-character

tokens, which represent the characters shown in Table 1-1

These sequences are not interpreted as digraphs if they occur within characterconstants or string literals In all other positions, they behave exactly like thesingle-character tokens they represent For example, the following code frag-ments are perfectly equivalent, and produce the same output With digraphs:

Trang 30

Without digraphs:

int arr[] = { 10, 20, 30 };

printf( "The second array element is <%d>.\n", arr[1] );

Output:

The second array element is <20>.

C also provides trigraphs, three-character representations, all of them beginning

with two question marks The third character determines which punctuationmark a trigraph represents, as shown in Table 1-2

Trigraphs allow you to write any C program using only the characters defined inISO/IEC 646, the 1991 standard corresponding to 7-bit ASCII The compiler’spreprocessor replaces the trigraphs with their single-character equivalents in thefirst phase of compilation This means that the trigraphs, unlike digraphs, aretranslated into their single-character equivalents no matter where they occur, even

in character constants, string literals, comments, and preprocessing directives Forexample, the preprocessor interprets the statement’s second and third questionmarks below as the beginning of a trigraph:

As another substitute for punctuation characters in addition to the

digraphs and trigraphs, the header file iso646.h contains macros

that define alternative representations of C’s logical operators andbitwise operators, such asandfor&&andxorfor^ For details, seeChapter 15

Trang 31

Identifiers | 13

Identifiers

The term identifier refers to the names of variables, functions, macros, structures

and other objects defined in a C program Identifiers can contain the followingcharacters:

• The letters in the basic character set,a–z andA–Z Identifiers are case-sensitive

• The underscore character,_

• The decimal digits0–9, although the first character of an identifier must not

Multibyte characters may also be permissible in identifiers However, it is up tothe given C implementation to determine exactly which multibyte characters arepermitted and what universal character names they correspond to

The following 37 keywords are reserved in C, each having a specific meaning to

the compiler, and must not be used as identifiers:

The following examples are valid identifiers:

x dollar Break error_handler scale64

The following are not valid identifiers:

1st_rank switch y/n x-ray

If the compiler supports universal character names, thenα is also an example of avalid identifier, and you can define a variable by that name:

default inline struct _Imaginary

else register union

Trang 32

library functions, which you cannot use for functions you define or for global ables See Chapter 15 for details

vari-The C compiler provides the predefined identifier_ _func_ _, which you can use inany function to access a string constant containing the name of the function This

is useful for logging or for debugging output; for example:

test_func: received null pointer argument

There is no limit on the length of identifiers However, most compilers consideronly a limited number of characters in identifiers to be significant In other words,

a compiler might fail to distinguish between two identifiers that start with a longidentical sequence of characters To conform to the C standard, a compiler musttreat at least the first 31 characters as significant in the names of functions andglobal variables (that is, identifiers with external linkage), and at least the first 63characters in all other identifiers

Identifier Name Spaces

All identifiers fall into exactly one of the following four categories, which

consti-tute separate name spaces:

• Label names

• Tags, which identify structure, union and enumeration types

• Names of structure or union members Each structure or union constitutes aseparate name space for its members

• All other identifiers, which are called ordinary identifiers.

Identifiers that belong to different name spaces may be the same without causingconflicts In other words, you can use the same name to refer to different objects,

if they are of different kinds For example, the compiler is capable of guishing between a variable and a label with the same name Similarly, you cangive the same name to a structure type, an element in the structure, and a vari-able, as the following example shows:

distin-struct pin { char pin[16]; /* */ };

_Bool check_pin( struct pin *pin )

{

int len = strlen( pin->pin );

/* */

}

Trang 33

Identifier Scope

The scope of an identifier refers to that part of the translation unit in which the

identifier is meaningful Or to put it another way, the identifier’s scope is that part

of the program that can “see” that identifier The type of scope is always mined by the location at which you declare the identifier (except for labels, whichalways have function scope) Four kinds of scope are possible:

Function prototype scope

The parameter names in a function prototype have function prototype scope.Because these parameter names are not significant outside the prototypeitself, they are meaningful only as comments, and can also be omitted SeeChapter 7 for further information

The scope of an identifier generally begins after its declaration However, the type

names, or tags, of structure, union, and enumeration types and the names ofenumeration constants are an exception to this rule: their scope begins immedi-

ately after their appearance in the declaration, so that they can be referenced again

in the declaration itself (Structures and unions are discussed in detail inChapter 10; enumeration types are described in Chapter 2.) For example, in the

Trang 34

following declaration of a structure type, the last member of the structure,next, is

a pointer to the very structure type that is being declared:

first andptr have block scope

It is possible to use an identifier again in a new declaration nested within itsexisting scope, even if the new identifier does not have a different name space Ifyou do so, then the new declaration must have block or function prototype scope,and the block or function prototype must be a true subset of the outer scope In

such cases, the new declaration of the same identifier hides the outer declaration,

so that the variable or function declared in the outer block is not visible in the

inner scope For example, the following declarations are permissible:

double x; // Declare a variable x with file scope

long calc( double x ); // Declare a new x with function prototype scope

int main( )

{

long x = calc( 2.5 ); // Declare a long variable x with block scope

if( x < 0 ) // Here x refers to the long variable

{ float x = 0.0F; // Declare a new float variable x with block scope /* */

variablexfrom withinmain( ) Furthermore, in the conditional block that depends

on theifstatement,xrefers to the newly declaredfloatvariable, which in turnhides thelong variablex

How the C Compiler Works

Once you have written a source file using a text editor, you can invoke a C

compiler to translate it into machine code The compiler operates on a translation

Trang 35

How the C Compiler Works | 17

unit consisting of a source file and all the header files referenced by#include

direc-tives If the compiler finds no errors in the translation unit, it generates an object

file containing the corresponding machine code Object files are usually identified

by the filename suffix o or obj In addition, the compiler may also generate an

assembler listing (see Part III)

Object files are also called modules A library, such as the C standard library,

contains compiled, rapidly accessible modules of the standard functions

The compiler translates each translation unit of a C program—that is, each sourcefile with any header files it includes—into a separate object file The compiler then

invokes the linker, which combines the object files, and any library functions used,

in an executable file Figure 1-1 illustrates the process of compiling and linking a

program from several source files and libraries The executable file also containsany information that the target operating system needs to load and start it

The C Compiler’s Translation Phases

The compiling process takes place in eight logical steps A given compiler maycombine several of these steps, as long as the results are not affected The steps are:

1 Characters are read from the source file and converted, if necessary, into thecharacters of the source character set The end-of-line indicators in the sourcefile, if different from the new line character, are replaced Likewise, anytrigraph sequences are replaced with the single characters they represent.(Digraphs, however are left alone; they are not converted into their single-character equivalents.)

2 Wherever a backslash is followed immediately by a newline character, thepreprocessor deletes both Since a line end character ends a preprocessor

Figure 1-1 From source code to executable file

1st translation unit 1st object file

2nd translation unit 2nd object file

nth translation unit nth object file

Standard library

Other libraries

Executable file Linker

Compiler

Trang 36

directive, this processing step lets you place a backslash at the end of a line inorder to continue a directive, such as a macro definition, on the next line

Every source file, if not completely empty, must end with a new linecharacter

3 The source file is broken down into preprocessor tokens (see the next section,

“Tokens”) and sequences of whitespace characters Each comment is treated

as one space

4 The preprocessor directives are carried out and macro calls are expanded

Steps 1 through 4 are also applied to any files inserted by#includedirectives Once the compiler has carried out the preprocessordirectives, it removes them from its working copy of the sourcecode

5 The characters and escape sequences in character constants and string literalsare converted into the corresponding characters in the execution character set

6 Adjacent string literals are concatenated into a single string

7 The actual compiling takes place: the compiler analyzes the sequence oftokens and generates the corresponding machine code

8 The linker resolves references to external objects and functions, and ates the executable file If a module refers to external objects or functions thatare not defined in any of the translation units, the linker takes them from thestandard library or another specified library External objects and functionsmust not be defined more than once in a program

gener-For most compilers, either the preprocessor is a separate program, or the compilerprovides options to perform only the preprocessing (steps 1 through 4 in thepreceding list) This setup allows you to verify that your preprocessor directiveshave the intended effects For a more practically oriented look at the compilingprocess, see Chapter 18

Tokens

A token is either a keyword, an identifier, a constant, a string literal, or a symbol.Symbols in C consist of one or more punctuation characters, and function asoperators or digraphs, or have syntactic importance, like the semicolon that termi-nates a simple statement, or the braces { } that enclose a block statement Forexample, the following C statement consists of five tokens:

Trang 37

How the C Compiler Works | 19

The tokens interpreted by the preprocessor are parsed in the third translationphase These are only slightly different from the tokens that the compiler inter-prets in the seventh phase of translation:

• Within an #include directive, the preprocessor recognizes the additionaltokens<filename> and"filename"

• During the preprocessing phase, character constants and string literals havenot yet been converted from the source character set to the execution charac-ter set

• Unlike the compiler proper, the preprocessor makes no distinction betweeninteger constants and floating-point constants

In parsing the source file into tokens, the compiler (or preprocessor) alwaysapplies the following principle: each successive non-whitespace character must beappended to the token being read, unless appending it would make a valid tokeninvalid This rule resolves any ambiguity in the following expression, for example:

a+++b

Because the first +cannot be part of an identifier or keyword starting witha, itbegins a new token The second+appended to the first forms a valid token—theincrement operator—but a third+does not Hence the expression must be parsedas:

a ++ + b

See Chapter 18 for more information on compiling C programs

Trang 38

In C, the term object refers to a location in memory whose contents can represent values Objects that have names are also called variables An object’s type deter-

mines how much space the object occupies in memory, and how its possiblevalues are encoded For example, the same pattern of bits can representcompletely different integers depending on whether the data object is interpreted

as signed—that is, either positive or negative—or unsigned, and hence unable to

represent negative values

Typology

The types in C can be classified as follows:

• Basic type

• Standard and extended integer types

• Real and complex floating-point types

Trang 39

Integer Types | 21

The basic types and the enumerated types together make up the arithmetic types The arithmetic types and the pointer types together are called the scalar types Finally, array types and structure types are referred to collectively as the aggregate

types (Union types are not considered aggregate, because only one of their

members can store a value at any given time.)

A function type describes the interface to a function; that is, it specifies the type of

the function’s return value, and may also specify the types of all the parametersthat are passed to the function when it is called

All other types describe objects This description may or may not include the

object’s storage size: if it does, the type is properly called an object type; if not, it is

an incomplete type An example of an incomplete type might be an externally

defined array variable:

extern float fArr[]; // External declaration

This line declares fArras an array whose elements have type float However,because the array’s size is not specified here,fArr’s type is incomplete As long asthe global array fArr is defined with a specified size at another location in theprogram—in another source file, for example—this declaration is sufficient to letyou use the array in its present scope (For more details on external declarations,see Chapter 11.)

This chapter describes the basic types, enumerations and the type

void The derived types are described in Chapters 7 through 10

Some types are designated by a sequence of more than one keyword, such as

unsigned short In such cases, the keywords can be written in any order However,there is a conventional keyword order, which we use in this book

Integer Types

There are five signed integer types Most of these types can be designated byseveral synonyms, which are listed in Table 2-1

For each of the five signed integer types in Table 2-1, there is also a corresponding

unsigned type that occupies the same amount of memory, with the same

align-ment: in other words, if the compiler alignssigned intobjects on even-numbered

Table 2-1 Standard signed integer types

signed char

short short int, signed short, signed short int

long long int, signed long, signed long int

long long(C99) long long int, signed long long, signed long long int

Trang 40

22 | Chapter 2: Types

byte addresses, thenunsigned intobjects are also aligned on even addresses Theseunsigned types are listed in Table 2-2

C99 introduced the unsigned integer type_Boolto represent Boolean truth values

The Boolean value true is coded as 1, and false is coded as 0 If you include the header file stdbool.h in a program, you can also use the identifiersbool,true, and

false, which are familiar to C++ programmers The macroboolis a synonym forthe type_Bool, andtrue andfalse are symbolic constants equal to 1 and 0.The typecharis also one of the standard integer types However, the one-wordtype name charis synonymous either withsigned charor withunsigned char,depending on the compiler Because this choice is left up to the implementation,

char,signed char, andunsigned char are formally three different types

If your program relies oncharbeing able to hold values less thanzero or greater than 127, you should be using eithersigned charorunsigned char instead

You can do arithmetic with character variables It’s up to you to decide whetheryour program interprets the number in acharvariable as a character code or assomething else For example, the following short program treats thecharvalue in

ch as both an integer and a character, but at different times:

char ch = 'A'; // A variable with type char.

printf("The character %c has the character code %d.\n", ch, ch);

for ( ; ch <= 'Z'; ++ch )

printf("%2c", ch);

In theprintf( )statement,chis first treated as a character that gets displayed, andthen as numeric code value of the character Likewise, theforloop treatschas aninteger in the instruction++ch, and as a character in theprintf( ) function call

On systems that use the 7-bit ASCII code, or an extension of it, the code producesthe following output:

The character A has the character code 65.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A value of type char always occupies one byte—in other words, sizeof(char)

always yields 1—and a byte is at least eight bits wide Every character in the basiccharacter set can be represented in achar object as a positive value

Table 2-2 Unsigned standard integer types

_Bool bool (defined in stdbool.h)

unsigned char

unsigned int unsigned

unsigned short unsigned short int

unsigned long unsigned long int

unsigned long long unsigned long long int

Tiêu đề	C in a nutshell
Tác giả	Peter Prinz, Tony Crawford
Thể loại	book
Năm xuất bản	2006
Thành phố	Beijing

Định dạng
Số trang	620
Dung lượng	7,09 MB