The New C Standard- P16

1899 A preprocessing directive of the form #include q-char-sequence # include "q-char-sequence" new-line causes the replacement of that directive by the entire contents of the source fil

Trang 1

6.10.1 Conditional inclusion 1883

• The specification has changed between C90 and C99.

The problem with any guideline recommendation is that the total cost is likely to be greater than the total

benefit (a cost is likely to be incurred in many cases and a benefit obtained in very few cases) For this reason

no recommendation is made here The discussion on suffixed integer constants is also applicable in the 835integerconstant

type first in listcontext of a conditional inclusion directive.

se-Commentary

5

Commentary

The C committee recognized that developers may choose to perform different phases of translation on

different hosts For instance, source files may be preprocessed and then distributed for further translation on

other, different, hosts.

Common Implementations

Differences between the numeric values in these two cases is rare (although cases involving Ascii and

Coding Guidelines

Making use of the numeric value of character constants is making use of representation information, which is

covered by a guideline recommendation However, there are cases where deviations may occur.

569.1tation in-formation

represen-using 569.1represen-tation in-formation

Trang 2

#ifndef

# ifdef identifier new-line group opt

# ifndef identifier new-line group opt

check whether the identifier is or is not currently defined as a macro name.

operator However, there does not appear to be a worthwhile cost/benefit to recommending one of the possibilities.

1886 142) Thus on an implementation where INT_MAX is 0x7FFF and UINT_MAX is 0xFFFF, the constant 0x8000

is signed and positive within a #if expression even though it is unsigned in translation phase 7.

Trang 3

6.10.1 Conditional inclusion 1890

Commentary

The order is from the lowest line number to the highest line number.

It may be possible to obtain some translation time performance advantage (at least for the original developer)

by appropriately ordering the directives Unlike developer behavior withifstatements, developers do not 1739selectionstatement

syntaxusually aim to optimize speed of translation when deciding how to order conditional inclusion directives

(experience suggests that developers often simply append new directive to the end of any existing directives).

Recognizing a known pattern in a sequence of directives has several benefits for readers They can make

use of any previous deductions they have made on how to interpret the directives and what they represent,

and the usage highlights common dependencies in the source In the following code fragment more reader

effort is required to spot similarities in the sequence that directives are checked than if both sequences of

directives had occurred in the same order.

Given the lack of attention from developers on the relative ordering of directives and the benefits of using

the same ordering, where possible, a guideline recommendation appears worthwhile However, a guideline

recommendation needs to be automatically enforceable and determining when two sequences of directives 0guideline rec-ommendation

enforceablehave the same affect, during translation, may be infeasible because information that is not contained within

the source may be required (e.g., dependencies between macro names that are likely to be defined via

translator command line options).

Where possible the visual order of evaluation of expressions within different sequences of nested

conditional inclusion directives shall be the same.

name that determines the directive in order to keep track of the level of nested conditionals;

Commentary

A parallel can be drawn with the behavior ofifstatements, in that if their controlling expression evaluates to1744if statementoperand compare

against 0zero, during program execution, any statements in the associated block are skipped.

processingwhile skipping

of nested conditionals;

Commentary

The preprocessor operates on a representation of the source written by the developer, not translated machine

code As such it needs to perform some processing on its input to be able to deduce when to stop skipping.

Trang 4

Figure 1889.1: Number of top-level source files (i.e., the contents of any included files are not counted) and (right) complete

translation of this book’s benchmark programs

Directives need to be processed to keep track of the level of nesting of conditionals and translation phases 1–3 still need to be performed (line splicing could affect what is or is not the start of a line) and characterstransla-

tion phase

1

116

within a comment must not be treated as directives.

The intent of only requiring a minimum of directive processing, while skipping, is to enable partially written source code to be skipped and to allow preprocessors to optimize their performance in this special case, speeding up the rate at which the input is processed.

Trang 5

6.10.2 Source file inclusion 1896

Example

In the following the#definedirective is not well formed But because this group is being skipped the

translator is required to ignore this fact.

This group is processed exactly as-if it appeared in the source outside of any group.

processed;

Commentary

A semantic rule to associate#elsewith the lexically nearest preceding#if(or similar form) directive, like

the one given forifstatements, is not needed because conditional inclusion is terminated by a#endif1747elsebinds to

near-est ifdirective.

Like the matching#if(or similar form) directive case, all preprocessing tokens in the group are treated as

if they appeared outside of any conditional inclusion directive Processing continues until the first#endifis

encountered (which must match the opening directive).

The arguments made forifstatements always containing anelsearm might be thought to also apply to1745elseconditional inclusion However, the presence of a matching#endifdirective reduces the likelihood that

readers will confuse which preprocessing directive any#elseassociates with (although other issues, such

as lack of indentation or a large number of source lines between directives can make it difficult to visually

associate matching directives).

during preprocessing (but not in a group that is skipped) Also there is no requirement that the spelling of

the header in the C source file be represented by a source file of the same spelling The C Standard has no

explicit knowledge of file systems and is silent on the issue of directory structures Minimum required limits

on the implementation processing of a header name are specified elsewhere. 1909#includemapping to host

fileFailure to locate a header or source file that can be processed by the implementation (e.g., a file of the

specified name does not exist, at least along the places searched) is a constraint violation.

Trang 6

6.10.2 Source file inclusion

1896

Other Languages

Most languages do not specify a#includemechanism, although many of their implementations provide one The approach commonly used by C implementations is popular, but not universal Some languages explicitly state that a#includedirective denotes a file of the given name in the translators host environment.

For most implementations the header name maps to a file name of the same spelling It is quite common for the translation environment to ignore the case of alphabetic letters (e.g., MS-DOS and early versions of Microsoft Windows), or to limit the number of significant characters in the file name denoted by a header name (the remaining characters being ignored) Use of the/character in specifying a full path to a file is sufficiently common usage that even host environments where this character is not normally associated with

a directory separator support such usage in header names (many Microsoft windows translators support this character, as well as the\character, as a directory separator).

In the majority of implementations#includedirectives specify files containing source in text form.source file

representation121

However, some implementations support what are known as precompiled headers.

header

precompiled121

It is not uncommon (over 10% of#includes in Figure1896.1 ) for the same header to be#included

more than once when translating a source file (it is a requirement that implementations support this usage for standard headers) The following are some of the techniques implementations use to reduce the overhead of subsequent#includes.

• A common convention is to bracket the contents of a header, starting with the preprocessing token sequence#ifndef _ _H_file_name_ _/#define _ _H_file_name_ _and ending with#endif The processing of subsequent#includes of the same header is then reduced to the minimal processing

needed to skip to the matching#endif Some implementations (e.g.,gcc) go one step further and detect headers that contain such bracketing the first time they are processed, and completely skips opening and processing the header if it is subsequently encountered again in a#includedirective.

• Support the preprocessing directive#import.[359]This directive is equivalent to the#includedirective except that if the specified header has already been included it is not included again.

Some coding guideline documents recommend that implementation supplied headers appear before developer written headers, in a source file Such recommendations overlook the possibility that a developer written header might itself#includean implementation header.

denote all headers (i.e., all systems headers are counted), triangles denote all headers delimited by quotes (i.e., likely to be user

form of this book’s benchmark programs

Trang 7

Unnecessary headers #include’d

110100

#includes

110100

1,000

"header"

Experience suggests that once a#includedirective appears in a source file it is rarely removed (see

Figure 1896.2 ) and that new#includedirectives are simply added after the last one The issue of redundant

codeThere does not appear to be a worthwhile benefit in ordering#includedirectives in any way (apart from

any relative ordering dictated by dependencies between headers).

Trang 8

1897

Rank

1101001,000

"header-name"gives respective an exponent of -2.26, xmin= 8, and -1.8, xmin= 9 Based on the visible form of the.cfiles

searches a sequence of implementation-defined places for a header identified uniquely by the specified

of the header.

Commentary

File systems invariably provide a unique method of identifying every file they contain (e.g., a full path name) The base document recognized the disadvantages of requiring that the full path name be specified in each#includedirective and permitted a substring of it to be given The implementation-defined places are usually additional character sequences (e.g., directory names) added to theh-char-sequencein an attemptheader name

syntax918

to create a full path name that refers to an existing file.

Standard intends that the rules which are eventually provided by the implementor correspond as closely as possible to the original K&R rules The primary reason that explicit rules were not included in the Standard

is the infeasibility of describing a portable file system structure It was considered unacceptable to include UNIX-like directory rules due to significant differences between this structure and other popular commercial file system structures.

within an included file entails a search for the named file relative to the file system directory that holds the

to the same current directory The C89 Committee decided in principle in favor of K&R approach, but was unable to provide explicit search rules as explained above.

is often different from that used for the form"q-char-sequence" For instance, in the<h-char-sequence>

case the contents of/usr/includemight be searched first, followed by the contents of the directory taining the.cfile, while in"q-char-sequence"case the contents of the directory containing the.cfile might be searched first, followed by other places.

Trang 9

con-6.10.2 Source file inclusion 1897

The environment in which a translator executes may also affect the sequence of places that are searched.

For instance, the affect of relative path names (e.g., /proj/abc.h) on the identity of the current directory.

gccsearches two directories,/usr/includeand another directory that holds very machine specific files,

such asstdarg.h(e.g.,/usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/includeon your

au-thors computer).gccsupports the#include_nextdirective This directive causes the search algorithm to

skip some of the initial implementation-defined places that would normally be searched The initial places

that are skipped are those that were searched in locating the file containing the#include_nextdirective

(including the place where the search succeeded).

Tzerpos and Holt[1416]describe a well-formedness theory of header inclusion that enables unnecessary

#includedirectives to be deduced.

The standard does not specify the order in which the implementation-defined places are searched This is a

potential coding guideline issue because it is possible that ah-char-sequencewill match in more than one

of the places (i.e., there is a file having the same name along several of the different possible search paths).

The behavior is thus dependent (i.e., it is assumed that the contents of the different headers will be different)

on the order in which the places are searched.

Experience suggests that the affect of a translator locating an#included file different from the one

expected to be located by the developer has one of two consequences— (1) when the contents of the file

accessed is similar to the one intended (e.g., a different version of the intended file) the source file may be

successfully translated, and (2) when the contents of the file accessed has no connection with the intended

file the source is rarely successfully translated The problem might therefore be considered to be one of

version management, rather than the choice of characters used in ah-char-sequence There are a number

of reasons why a solution to this issue is to not useh-char-sequences at all, including the following:

• For the< >delimited form, implementations usually look in a predefined location first (as described in

the Common implementation section above and in the following C sentence). 1898#includeplaces to search

forEnsuring that the names chosen by developers for the headers they create are different from those of

system headers is an almost impossible task While it might be possible to enumerate the set of names

of existing file names of system headers contained in commercially important environments, members

are likely to be added to this set on a regular basis.

Rather than trying to avoid using file names likely to match those of system headers, developers could

ensure that places containing system headers are searched last.

• The< >delimited form is often considered to denote externally supplied headers (e.g., provided by

the implementation or translator environment vendor) What constitutes a system supplied header is

open to interpretation One distinction that can be made between system and developer headers is that

developers do not control of the contents of system headers Consequently, it can be argued that their

contents are not subject to coding guidelines.

Headers whose contents have been written by developers are subject to coding guidelines The

convention generally adopted to indicate this status is to use the double-quote character delimit form

of#include.

Developers sometimes specify full path names in headers (see Table 1896.1 ) This is a configuration

management issue and is not considered to be within the scope these coding guidelines.

Trang 10

1899

translation environments Information was automatically extracted and represents an approximate lower bound Versions of thetranslation environments from approximately the same year (mid 1990s) were used The counts for ISO C assumes that theminimum set of required identifiers are declared and excludes the type generic macros

1898 How the places are specified or the header identified is implementation-defined.

Implementations invariably search one or more predefined locations first (e.g.,/usr/include), followed

by a list of alternative places A number of techniques are used to allow developers to specify a list of alternative places to be searched for files corresponding to the headers specified in a#includedirective For instance, the alternative places may be specified via a translator command line option (e.g.,-I), in a translator configuration file (e.g.,gccversion 2.91.66 hosted on RedHat Linux reads many default locations from the

is still hard coded in the translator sources), or an environment variable (e.g., several Microsoft windows based translators useINCLUDE).

The directory separator used in Unix and MS-DOS slants in different directions Many implementations,

in both environments, recognize both characters as directory delimiters One consequence of this is that escape sequences are not recognized as such (something that is unlikely to be a problem in header names) The RISCOS environment does not support filenames ending in.h The implementation-defined behavior for this host is to look in a directory calledh, for a file of the given name with the.hremoved.

The implementation-defined behavior associated with how the places are specified occurs outside of the source code and is the remit of configuration management guidelines For this reason nothing further is said here.

1899

A preprocessing directive of the form

#include

q-char-sequence

# include "q-char-sequence" new-line

causes the replacement of that directive by the entire contents of the source file identified by the specified

Commentary

The commonly accepted intent of this form of the#includedirective is that it is used to reference source files created by developers (i.e., headers that are not provided as part of the implementation or host environment) The only syntactic difference betweenq-char-sequenceandh-char-sequenceis that neither sequence may contain their respective delimiters.

header name

syntax918

Most q-char-sequences end with one of two character sequences (i.e., c or.h) The character sequences before these suffixes is often called the header name.

Trang 11

Other Languages

The use of double-quote as the delimiter is the almost universal form used in other languages (although some

use the’character because that is what is used to delimit string literals).

The term commonly used to refer to these source files is header The context of the conversation often being

used to distinguish any other intended usage The intent is that the contents of these source files is controlled

by developers and as such they are subject to coding guidelines.

Commentary

While this “implementation-defined manner” might be the same as that for the< >delimited form The intent

is for it to be sufficiently different that developers do not need to be concerned about the name of a header

created by them matching one provided as part of the implementation (and therefore potentially found by the

translator when searching for a matching header) For instance, your author does not know the names of

most of the 304 files (e.g.,compface.h) contained in/usr/includeon his software development computer.

h-char-sequence

The search algorithm used invariably differs from that used for the< >delimited form (otherwise there would

be little point in distinguishing the two cases) The search algorithm used by some implementations is to

first look in the directory containing the source file currently being translated (which may itself have been

included) If that search fails, and the current source file has itself been included, the directory containing the

source file that#includeit is then searched This process continuing back through any nested#include

directives For instance, in:

(assuming the translation environment supports the path names used), translating the source filefile_3.c

causesfile_2.cto be included, which in turn includesfile_3.c The source fileabc.hwill be searched

for in the directories/foo,/another/pathand then the directory containingfile_3.c.

Some implementations use the double-quote delimited form within their system headers, to change the

default first location that is searched For instance, a third-party API may contain the headerabc.h, which

in turn needs to includeayx.h Using the form"ayx.h"means that the implementation will search in the

directory containingabc.hfirst, not/usr/include This usage can help localize the files that belong to

specific APIs Other implementations use a search algorithm that starts with the directory containing the

original source file being translated.

If the source file is not found after these places have been searched, some implementations then search

other places specified via any translator options Other implementations simply follow the behavior described 1898#includeplaces to search

for

by the following C sentence (which has the consequence of eventually checking these other places).

# include <h-char-sequence> new-line

Trang 12

1908

Commentary

The previous search can fail in the sense that it does not find a matching source file.

Some existing code uses the double-quote delimited form of#include directive to include headers provided by the implementation (rather than the< >delimited form) This requirement ensures that such code continues to be conforming.

A preprocessing directive of the form

# include pp-tokens new-line

(that does not match one of the two previous forms) is permitted.

Trang 13

Commentary

This implementation-defined behavior may take a number of forms, including:

• The##operator can be used to glue preprocessing tokens together However, the behavior is undefined1958##

operator

if the resulting character sequence is not a valid preprocessing token For instance, the five preprocess-1963##

if result not validing tokens {{} {string} {.} {h} {}} cannot be glued together to form a valid preprocessing token

without going through intermediate stages whose behavior is undefined.

• Creating a preprocessing token, via macro expansion, having the double-quote delimited form (i.e., a

string preprocessing token) need not depend on any implementation-defined behavior The stringize

does the implementation strip off the space character at the ends of the delimited character sequence?

mapping

to host file

Commentary

This C sentence and the following ones in this C paragraph are a specification of the minimum set of

requirements that an implementation must meet For sequences outside of this set the implementation mapping

may be non-unique (like, for instance, the Microsoft Windows technique of mapping files ending in.htmlto

.htm) The handling of character sequences that resemble UCNs may also differ, e.g.,"\ubada\file.txt"

(Ubada is a city in Tanzania and BADA is the Hangul symbol ᄇ ᅮ ᇁ in ISO 10646) The standard does not

permit any number of period characters because many operating systems do not permit them (at least one,

RISCOS, does not permit any).

The wording was changed by the response to DR #302 to extend the specification to be more consistent

with C++.

C++

16.2p5

Trang 14

1911

The implementation provides unique mappings for sequences consisting of one or more nondigits (2.10) followed

by a period (.) and a single nondigit.

Other Languages

Other languages either specified to operate within the same operating systems and file systems limitations as

C and as such have to deal with the same issues, or require an integrated development environment to be created before they can be used.

Implementations invariably pass the sequence of characters that appear between the delimiters (when searching other places a directory path may be added) as an argument in a call tofopenor equivalent system function The called library function will eventually call some host operating system function that interfaces

to the host file system The C translator’s behavior is thus controlled by the characteristics of the host file system and how it maps character sequences to file names The handling of the period character varies between file systems, known behaviors include:

• Unix based file systems permit more than one period in a file name.

• MS-DOS based file systems only permit a single period in a file name.

• RISCOS, an operating system for the Acorn ARM processor does not support filenames that contain

a period For this host file names, that contained a period, specified in a#includedirective were mapped using a directory structure All file names ending in the characters.hwere searched for in a directory calledh.

Because an implementation is not required to provide a unique mapping for all sequences it is possible that

an unintended header or source file will be accessed, or the translator will fail to identify a known header or source file The possible consequences of an unintended access are discussed elsewhere, while failure to

with using character sequences having a unique mapping in the different environments that the source may

be translated in is outside the scope of these coding guidelines.

1910 The first character shall be a letter not be a digit.

significant

Trang 15

Commentary

These permissions reflect known characteristics of file systems in which translators are executed.

C90

The limit specified by the C90 Standard was six significant characters However, implementations invariably

used the number of significant characters available in the host file system (i.e., they do not artificially limit the

number of significant characters) It is unlikely that a header of source file will fail to be identified because

of a difference in what used to be a non-significant character.

C++

The C++ Standard does not give implementations any permissions to restrict the number of significant

characters before the period (16.1p5) However, the limits of the file system used during translation are likely

to be the same for both C and C++implementations and consequently no difference is listed here.

All file systems place some limits on the number of characters in a source file name— for instance:

• Most versions of the Microsoft DOS environment ignore the distinction of alphabetic case and restrict

the mapping to eight significant characters before any period (and a maximum of three after it).

• POSIX requires that at least 14 characters be significant in a file name (it also requires implementations

to support at least 255 characters in a pathname) Many Linux file systems support up to 255 characters

in a filename and 4095 characters in a pathname.

The potential problems associated with limits on sequences characters that are likely to be treated as unique

is a configuration management issue that is outside the scope of these coding guidelines.

1912 A#includepreprocessing directive may appear in a source file that has been read because of a#include

directive in another file, up to an implementation-defined nesting limit (see 5.2.4.1).

Commentary

Thus#includedirectives can be nested within source files whose contents have themselves been#included.

This issue is discussed elsewhere While this permission only applies to source files, an implementation295limit

#include ingusing some form of precompiled headers (which are not source files within the standard’s definition of the 121header

nest-precompiledterm) that did not support this functionality would not be popular with developers. 108source files

#include <stdio.h>

#include "myprog.h"

Other Languages

Some languages only have a single form of#includedirective for all headers.

Trang 16

6.10.3 Macro replacement

1919

Commentary

This example does not illustrate any benefit compared to that obtained from placing separate#include

directives in each arm of the conditional inclusion directive.

1915

Forward references: macro replacement (6.10.3).

1916 145) Note that adjacent string literals are not concatenated into a single string literal (see the translationfootnote

Commentary

This is actually a definition in a Constraints clause (it is used by two constraints in this C subsection).

The check against same spelling only needs to take into account the significant characters of an identifier.

• Interfere with existing code as little as possible.

• Keep the preprocessing model simple and uniform.

• Allow macros to be used wherever functions can be.

• Define macro expansion such that it produces the same token sequence whether the macro calls appear in open text, in macro arguments, or in macro definitions.

Preprocessing is specified in such a way that it can be implemented either as a separate text-to-text prepass

or as a token-oriented portion of the compiler itself Thus, the preprocessing grammar is specified in terms of tokens.

Trang 17

6.10.3 Macro replacement 1921

Commentary

There was an existing body of code, containing redefinitions of the same macro, when the C Standard

was first written The C committee did not want to specify that existing code containing such usage was

non-conforming, but they did consider the case where the bodies of any subsequent definitions differed to be

Any subsequent#undefof the macro name popping this stacked definition and to make it the current one.

C permits more than one definition of the same macro name, with the same body, and more than one external

definition of the same object, with the same type and the coding guideline issues are the same for both (in 420linkage

422.1identifier

declared in one fileboth cases translators are not always required to issue a diagnostic if the definitions are considered to be

different).

In both cases a technique for avoiding duplicate definitions, during translation but not in the visible source,

is to bracket definitions with#ifndef MACRO_NAME/#endif(in the case of the file scope object a macro

name needs to be created and associated with its declaration) Using this technique has the disadvantage that

it prevents the translator checking that any subsequent redeclarations of an identifier are the same (unless the

bracketing occurs around the only textual declaration that occurs in any source file used to build a program).

macro redefinitionpreprocessing directive unless the second definition is a function-like macro definition that has the same

number and spelling of parameters, and the two replacement lists are identical.

Commentary

The issues are the same as for object-like macros, with the addition of checks on the parameters Requiring1919object-like

macro redefinitionthat the parameters be spelled the same, rather than, for instance, that they have an identical effect, simplifies

the similarity checking of two macro bodies For instance, in:

a translator is not required to deduce that the two definitions ofFMare structurally identical.

macro.

Commentary

In the following (assuming $is a member of the extended character set and permitted in an identifier216extended

character setpreprocessing token):

Trang 18

1922

Correction Add to subclause 6.8, page 86 (Constraints):

In the definition of an object-like macro, if the first character of a replacement list is not a character required by subclause 5.2.1, then there shall be white-space separation between the identifier and the replacement list.*

[Footnote *: This allows an implementation to choose to interpret the directive:

#define THIS$AND$THAT(a, b) ((a) + (b))

makes, it must also issue a diagnostic.]

However, the complex interaction between this specification and UCNs was debated during the C9X review process and it was decided to simplify the requirements to the current C99 form.

If (before argument substitution) any argument consists of no preprocessing tokens, the behavior is undefined.

The behavior of the following was discussed in DR #003q3, DR #153, and raised against C99 in DR #259 (no committee response was felt necessary).

3

What was undefined behavior in C90 (an empty argument) is now explicitly supported in C99 The two most likely C90 translator undefined behaviors are either to support them (existing source developed using such a translator will may contain empty arguments in a macro invocation), or to issue a diagnostic (existing source developed using such a translator will not contain any empty arguments in a macro invocation).

C++

The C++Standard contains the same wording as the C90 Standard.

C++translators are not required to correctly process source containing macro invocations having any empty arguments.

Trang 19

Some C90 implementations (e.g.,gcc) treated empty arguments as an argument containing no preprocessing

tokens, while others (e.g., Microsoft C) treated an empty argument as being a missing argument (i.e., a

when the trailing arguments are included in a list of arguments to another macro or function For example, if

#define dprintf(format, ) \

dfprintf(stderr, format, VA_ARGS )

and it were allowed for there to be only one argument, then there would be a trailing comma in the expanded

form While some implementations have used various notations or conventions to work around this problem,

the Committee felt it better to avoid the problem altogether.

While some developers may be confused because the requirements on the number of arguments are different

from functions defined using the ellipsis notation, passing too few arguments is a constraint violation (i.e.,

translators are required to issue a diagnostic that a developer then needs to correct).

) terminates it

Commentary

While this requirement is specified in the syntax, it is interpreted as requiring the)preprocessing token to

occur before any macro replacement of the identifiers following the matching(preprocessing token For

the invocation is terminated by the)preprocessing token that occurs immediately before;, not the expanded

form ofR_PAREN.

ellipsis notation in the argumentsparameters.

Trang 20

1928

Commentary

This requirement simplifies a translators processing of occurrences of the identifier_ _VA_ARGS_ _.

This typographical correction was made by the response to DR #234.

C90

Support for_ _VA_ARGS_ _is new in C99.

Source code declaring an identifier with the spelling_ _VA_ARGS_ _will cause a C99 translator to issue a diagnostic (the behavior was undefined in C90).

Trang 21

one name space

Commentary

Object-like and function-like macro names exist in the same name space However, an identifier defined as

a function-like macro is only treated as such when its name is followed by an opening parenthesis Name 1935function-like macro

followed by (

before/after placement listconsidered part of the replacement list for either form of macro.

re-Commentary

Specifying that such white-space should be considered to part of the replacement list has potential

main-tenance and comprehension costs (it restricts how the start of the replacement list may be indented and

white-space following the replacement list is not immediately visible to readers) for no obvious benefit.

directive could begin, the identifier is not subject to macro replacement.

Commentary

This is a special case of a more general specification given elsewhere.

1867tokens indirective

not expanded unless

Some preprocessors used to perform this kind of replacement (some past entries in the Obfuscated C

contest[642]relied on such translator behavior).

Example

In the following, even although the identifierdefineis defined as a macro, the line starting#definestill

processed as a macro definition directive, and not as a#undefdirective.

2

object-like

# define identifier replacement-list new-line

the replacement list of preprocessing tokens that constitute the remainder of the directive.

Trang 22

The preprocessing tokens in atext-lineare unconditional scanned for instances of macro names topreprocessor

directives

syntax

1854

expand, as are preprocessing tokens in some preprocessing directives.

The standard lists a few restrictions on identifiers that can be defined as macro names The issue ofpredefined

Implementations invariably provide a mechanism that is external to the source code for defining macros, e.g., the-Dcommand line option.

• Parameterizing the definition of a type This issue is discussed in more detail under typedef names.typedef name

endto represent the C punctuators{and}respectively (this existing usage was one reason these macro names were not used as alternative spellings, in<iso646.h>, for these punctuators; it could have rendered

existing conforming code nonconforming), or a developer wanting to modify existing code to use greater floating-point precision might define the macro namefloatto bedouble.

The growth in the usage languages with C-like syntax over the last 10 years means that these days it is rare for developers to attempt to change the visual appearance of C source to be closer to a language they are more familiar with While a macro name that maps to a C token may be surprising to readers of the source, it

is unlikely to conflict with their existing C knowledge, and therefore might be considered at worse a minor inconvenience (i.e., cost).

Defining a macro whose name is the same as a keyword means that the behavior of translated source can

be very different from that expected from its visual appearance (such usage also results in undefined behavior

Trang 23

if the definition occurs prior to the inclusion of any library header) The presence of such a definition requires

that readers substitute their existing, default response, knowledge of behavior for a new behavior (assuming

that they had noticed the definition of the macro) Experience suggests that the short-term benefit of defining

and using such macro names is less than the longer term (which may be only a few days) costs associated

with comprehension and miscomprehension of the affected source.

A source file shall not define a macro name to have the spelling of a keyword.

Replacement lists may look innocuous enough when viewed in isolation However, in the context in which

they occur the expanded form may interact in unexpected ways with adjacent tokens For instance, looking at

the components of the following source in isolation:

the appearance of the replacement list ofSUMsuggests thatawill be added toband looking at the use of

SUMin the initialization oflocsuggests that it will be multiplied by the value ofglob However, the token

sequence after macro replacement isglob*a+b, which has a very different interpretation.

The visual appearance of a replacement list containing statements can also be misleading For instance, in:

the assignment todis not dependent on the value ofglob Which is counter to what the visual appearance of

the source suggests.

A general solution to both of these problems is to bracket the replacement list, ensuring that the visually

expected behavior is the same as the behavior that occurs after macro replacement.

A replacement list having the form of an expression containing one or more binary operators shall be

bracketed with parentheses, unless the binary operators are only those included in the production of a

postfix-expr.

A replacement list consisting of more than one statement shall be completely enclosed in a pair of

The visual appearance of declarations can also be deceptive when macro replacements are involved For

instance, in:

Trang 24

The bracketing technique cannot be used with a replacement list that represents a type (it would violate

C syntax) However, using a typedef name is not a general solution, it is possible to use macro names in situations where a typedef name cannot be used For instance, in:

2

it is possible to modify the type denoted byX_TYPEbecause the macro expanded form represents a valid integer type when preceded by unsigned However, the type denoted by a typedef name cannot be so

source line

118

a visible form that closely resembles that seen when it appears in other contexts is small The benefit for subsequent readers is the ability to use the same strategies to read source constructs as they use in other contexts.

There are a number of ways in which token sequences appearing in various contexts might visually resemble each other For instance, in the following definitions bothZERO_ARRAY_1andZERO_ARRAY_2

visually associate preprocessing tokens in the macro body, while inZERO_ARRAY_3preprocessing tokens in the macro body visual interacts with the preprocessing tokens in the preprocessing directive.

Trang 25

Figure 1931.1: Number of translation units containing a given number of macro names which were macro expanded, excluding

expansions that occurred while processing the contents of system headers Based on the translated form of this book’s benchmark

programs

Table 1931.1: Detailed breakdown of the kinds of replacement lists occurring in macro definitions Adapted from Ernst, Badros,

Table 1931.2: Common macro definitions listed with an abstracted form of their replacement list (as a percentage of all macro

This sentence was added by the response to DR #306 and removes the possibility of a reader interpreting the

1933macro

function-like

Trang 26

1934

# define identifier lparen identifier-list opt ) replacement-list new-line

# define identifier lparen ) replacement-list new-line

# define identifier lparen identifier-list , ) replacement-list new-line

defines a function-like macro with arguments, parameters, whose use is similar syntactically to a function call.

Commentary

This defines the term function-like macro This term is commonly used by developers Function-like macro definitions do not contain any type information Replacement is based solely on matching preprocessing token spellings.

Limits on the number of parameters a function-like macro definition may contain are discussed elsewhere.limit

general form, i.e., they also apply to function-like macro definitions.

The visual appearance of a function-like macro’s replacement list can be misleading in suggesting that an operation is performed on a parameter For instance, in:

Any parameter of a function-like macro appearing as an operand in an expression shall be parenthesized,

Trang 27

The visibility of the parameters also extends over the entire replacement list and is not affected by any

identifiers declared within the replacement list (they are simply treated as a sequence of preprocessing tokens

invocation of the macro).

Commentary

No formal syntax is specified for the sequence of preprocessing tokens that form an invocation of a

like macro However, in some contexts the sequence of preprocessing tokens in an invocation of a

function-like macro may result in undefined behavior (e.g., preprocessing tokens having the form of a preprocessing),1940argumentresemble

prepro-cessing directive

or the context in which the invocation occurs may have its own syntax (e.g., preprocessing directives are

ended by

It is possible to suppress the expansion of afunction-likemacro by ensuring that it is not followed by

a(preprocessing token (e.g., by enclosing the macro name in parentheses):

Some implementations provide both function and macro definitions of some library functions Developers

wanting to ensure that the function’s definitions are invoked parenthesize the name of the function to prevent

it being treated as a function-like macro An occurrence of an identifier currently defined as a function-like

macro and not followed by a(preprocessing token could be a fault (in which case a translator diagnostic is

likely to be generated because of a reference to an undeclared identifier), or the same identifier is used to

syntax

intervening matched pairs of left and right parenthesis preprocessing tokens.

Commentary

The syntax of function-like macros does not specify which right parentheses terminates an argument

list Hence the need for this wording Skipping intervening matched pairs of left and right parentheses

preprocessing tokens allows arbitrary expressions, which may containing parentheses, to be passed as

arguments Any preprocessing tokens between the matched parentheses are treated as belonging to the

argument and not part of the syntax of the macro invocation For instance:

Trang 28

1939

2

The)preprocessing token is searched for in the source file without performing macro expansion (DR

11 3

12 */

1937 Within the sequence of preprocessing tokens making up an invocation of a function-like macro, new-line is considered a normal white-space character.

arguments

arguments for the function-like macro.

Commentary

This introduces the common usage term arguments to refer to these preprocessing token sequences Limits

on the number of arguments that may appear in an invocation of function-like macro are discussed elsewhere.limit

arguments in

macro invocation

291

1939 The individual arguments within the list are separated by comma preprocessing tokens, but comma prepro-macro

An argument whose evaluation causes a side effect can sometimes result in program behavior that is surprising

to developers (because they failed to take account of the argument being evaluated more than once) For instance, in the following fragment:

2

the objectiis incremented twice There are a number of possible guideline recommendations that prevent these surprises occurring, these include recommending that:

Trang 29

• The evaluation of arguments to macros not have side effects Such a recommendation would require

that developers be aware of whether an identifier followed by a left parentheses results in a macro

replacement or a function call At some future time a function call may be replaced by a macro

invocation, which could then require that existing code be changed to ensure that arguments did not

cause side effects (this goes against the aim that adherence to guideline recommendations not require

an amount of effort that is out of proportion to the changes made to existing source).

0guideline ommendation

rec-adherence has a reasonable costSide effect related issues in other language constructs are discusses elsewhere. 1740controlling

expression

if statement

• The expansion of a macro not result in a sequence of tokens that evaluate any of its arguments more

than once Syntactically it is possible to create a replacement list that follows this recommendation.

However, semantically temporary variables of the correct type need to be visible and invoking the

same macro twice in the same full expression is likely to result in undefined behavior.

2

When usinggccthis problem can be solved by making use of two extensions, (1) thetypeofoperator

and (2) using the({ })punctuators to create an expression from a sequence of declarations and

statements For instance:

The costs associated with both of these possible recommendations would be incurred for all function-like

macros, while a benefit would only be obtained for a few uses It would seem that neither of them has

sufficient cost/benefit to make a guideline recommendation worthwhile.

Both of the previous possible recommendations treated the macro definition and its invocation in isolation.

Recommending that an argument causing a side effect not be passed to a macro whose corresponding

parameter is evaluated more than once is equivalent to one recommending that programs not contain faults,

Trang 30

possible behaviors include:

• Treating the sequence of preprocessing tokens between the matches parentheses treated as its argument.

• Treating those sequences of preprocessing tokens that have the form of a preprocessing directive as such a directive, i.e., definingMan object-like macro and passing either3or4as the argument to

• Issuing a diagnostic and failing to translate the source file.

Preprocessing directives can occur within the list of arguments in automatically generated, or processed,

code For instance, an expression original written by a developer may be expanded or split over several lines

(source is often emailed and some email programs have a lower limit on the number of characters on a line

than the C Standard).

Suppose line number 123 of a source file contained

a tool that converted C source into a form suitable for emailing might convert this to one of several forms:

• Splitting long lines is the simplest approach:

presence of line splices.

• Splitting long lines and adding#linedirectives so that any diagnostic messages can be related back to the original source might be more developer friendly:

Trang 31

However, ifdgayis defined as a macro the behavior will be undefined.

Rationale

Committee decided to not allow any preprocessor directives to be recognized as such inside of macros.

This C specification covers preprocessing directives, not preprocessing operators (such asdefined, which is

not #definedThis footnote was added by the response to DR #250.

comma preprocessing tokens, are merged to form a single item: the variable arguments.

Commentary

This defines the term variable arguments The same term is also used to refer to the arguments corresponding

to the ellipsis notation in a function definition It would be more exact for the specification to say “after the”

rather than “in the”.

The C preprocessor model of macro expansion is one of performing (potentially recursive) token sub- macro re-placementstitution, not of interpreting sequences of commands (e.g., there is no method of iterating) This model

has no existing framework for walking through a list of variable arguments, like statements in a function

definition Without completely rewriting the preprocessor specification there is little scope for anything other

than solution adopted by the C Committee Because all of the variable arguments are formed into a single

item they be used in a context that treats them as a single sequence of preprocessing tokens.

These guideline recommendations are driven by common developer behaviors in dealing with constructs.

This construct is new in C99 and as yet no significant experience has been gained about how developers

interact with it The specification of behavior is sufficiently different from the use of ellipsis in function

prototype definitions that drawing parallels, with the aim of framing an applicable guideline recommendation,1984variable macroEXAMPLE

argumentsdoes not look possible.

Commentary

This argument specification differs from that for function definitions, in that for macros at least one argument

is required to match the notation If it is possible that a single argument may be passed then the definition

of the macro needs to include as the only parameter in its definition.

The function-like macroONErequires at least one argument and the macroTWOrequires at least two arguments.

Trang 32

6.10.3.1 Argument substitution

1946

1943 146) Since, by macro-replacement time, all character constants and string literals are preprocessing tokens,footnote

Commentary

The term argument substitution is also commonly used by developers to refer to this process.

How a sequence of preprocessing tokens within a source file are split into the arguments that belong to a particular function-like macro invocation is discussed elsewhere.

Trang 33

6.10.3.1 Argument substitution 1948

argument macroexpandedpreprocessing token (see below), is replaced by the corresponding argument after all macros contained

therein have been expanded.

Commentary

That is, parameters occurring in the replacement list are replaced after their corresponding arguments have

been macro expanded The#and##operator contexts are the only situations where any macros appearing in

a parameter’s corresponding argument are not considered for replacement In the following the expanded

form ofPARAMis not examined for preprocessing tokens that have the same spelling as a parameter ofF.

3

formed the rest of the preprocessing file;

Commentary

Each argument is expanded in isolation (i.e., there is no interaction between arguments or any other

prepro-cessing tokens in the source file).

Completely expanding the argument requires thatFM1then be expanded (it is followed by an opening

parentheses) However, this expansion does not succeed because there is no matching closing parentheses, in

the sequence of preprocessing tokens for that argument The behavior is undefined Continuing on from the

the invocation ofFM3expands to23(the argumentFM1is not expanded further, as an argument, because it is

not followed by an opening parentheses).

Trang 34

the variable arguments shall form the preprocessing tokens used to replace it.

Commentary

This is a requirement on the implementation.

The extent to which the arguments corresponding to the parameter_ _VA_ARGS_ _are replaced may be affected by the presence of commas For instance, in:

the argument in the second invocation ofELLIP_FUNCexpands toF(’a’, ’b’)which in turn expands to

from((and other preprocessing tokens).

Usage

Based on the visible form of the.cfiles 0.26% (0.09%.hfiles) of the replacement lists of macro definitions contained a#operator There were no obvious patterns to the usage.

Semantics

Trang 35

6.10.3.2 The#operator 1952

stringize operand

by a single character string literal preprocessing token that contains the spelling of the preprocessing token

sequence for the corresponding argument.

Commentary

RationaleSome pre-C89 implementations decided to replace identifiers found within a string literal if they matched a

macro argument name The replacement text is a “stringized” form of the actual argument token sequence.

This practice appears to be contrary to K&R’s definition of preprocessing in terms of token sequences The

C89 Committee declined to elaborate the syntax of string literals to the point where this practice could

be condoned; however, since the facility provided by this mechanism seems to be widely used, the C89

Committee introduced a more tractable mechanism of comparable power.

In the following example:

2

the#preprocessing token exists in the expanded replacement list, rather than the replacement list and is

considered to be a punctuator rather than an operator.

Microsoft C supports the preprocessor operator#@(call the charizing operator) as an extension It converts

its operand to a character constant.

Commentary

Whether multiple white space between preprocessing tokens has already been converted to a single white

space before this conversion is discussed elsewhere Also there is no white space added where none existed 128white-spacesequence replaced

by one

in the source file.

RationaleOne problem with defining the effect of stringizing is the treatment of white space occurring in macro definitions.

Where this could be discarded in the past, now upwards of one logical line may have to be retained As

a compromise between token-based and character-based preprocessing disciplines, the C89 Committee

decided to permit white space to be retained as one bit of information: none or one Arbitrary white space is

replaced in the string by one space character.

Trang 36

6.10.3.2 The#operator

1955

Any misconceptions about the white space will appear between preprocessing tokens that have been stringized

is a developer education issue, not a coding guideline issue.

#

escape sequence

character beginning a universal character name.

Commentary

This specification is intended to ensure that the output produced by passing the string produced by the stringize operator as an argument toprintf, for instance, is the same as the visible form (with white-space characters reduced to a single space character) of the preprocessing token sequence immediately prior to being stringized (although this sequence may not exist in the visible source) In the following example:

2

if UCNs are mapped in translation phase 1 and @ is a supported character then the invocation ofmkstr

Trang 37

the result of the#operator need not be"a \ b".

Most implementations simply create a sequence of characters However, processing in subsequent phases of

translation (e.g., conversion of escape sequences) may also result in undefined behavior. 133transla-tion phase

Like C90, the behavior in C++is not explicitly defined (some implementations e.g., Microsoft C++, do not

support empty arguments).

if {1} is glued to {2} and then stringised the resulting preprocessing token is defined However, stringizing

{1} and then attempting to glue it to {2} does not yield a defined preprocessing token (the behavior is

undefined).

When both operators occur in a replacement list, performing token gluing first would appear to give the

highest probability of having a defined result, when a stringize operator is also present However, there is no

requirement that implementations use this order There is no need to specify an evaluation order for the#

operator because it is unary (the evaluation order or the##operator is discussed elsewhere). 1965##

evaluation der

or-Coding Guidelines

An order dependency on the evaluation of the#and##operators only exists when either of them could

be applied to the same preprocessing token in a replacement list Unlike full expression evaluation it is 1712full

expres-sionnot possible to use parentheses to group operands with preprocessor operators These operators have to be

adjacent to the preprocessing tokens they operate on However, this combination of events rarely occurs

(there are no occurrences in the.cfiles) and thus a guideline recommendation is not considered worthwhile.

Trang 38

5

Trang 39

new token by macro argument substitution One pre-C89 implementation replaced a comment within a macro

expansion by no characters instead of the single space called for in K&R The C89 Committee considered

this practice unacceptable.

As with “stringizing,” the facility was considered desirable, but not the extant implementation of this facility,

causes concatenation of the tokens on either side of it into a new composite token.

The specification of this pasting operator is based on these principles:

• Paste operations are explicit in the source.

substituted for the formal parameter; but the actual parameter is not replaced Given, for example

Trang 40

6.10.3.3 The##operator

1962

• Pasting does not cross macro replacement boundaries.

• The token resulting from a paste operation is subject to further macro expansion.

These principles codify the essential features of prior art and are consistent with the specification of the stringizing operator.

This defines the term placemarker The standard uses placemarker preprocessing tokens to describe an effect,

an implementation need not represent them internally The need for a placemarker preprocessing token occurs because the##operator does not cross replacement boundaries.

1961 For both object-like and function-like macro invocations, before the replacement list is reexamined for more

argument) is deleted and the preceding preprocessing token is concatenated with the following preprocessing token.

# and ##1981 Table 1961.1: Possible results of concatenating, using the##operator, pairs of preprocessing tokens (the one appearing in the left

column followed by the one appearing in the top row) where the result might be defined (undefined denotes undefined behavior)

identifier pp-number punctuator string-literal character-constant

punctuator orundefined

Tiêu đề	The New C Standard- P16
Trường học	Not Available
Chuyên ngành	Computer Science
Thể loại	Thesis
Năm xuất bản	2009
Thành phố	Not Available

Định dạng
Số trang	112
Dung lượng	804,08 KB