1899 A preprocessing directive of the form #include q-char-sequence # include "q-char-sequence" new-line causes the replacement of that directive by the entire contents of the source fil
Trang 16.10.1 Conditional inclusion 1883
• The specification has changed between C90 and C99.
The problem with any guideline recommendation is that the total cost is likely to be greater than the total
benefit (a cost is likely to be incurred in many cases and a benefit obtained in very few cases) For this reason
no recommendation is made here The discussion on suffixed integer constants is also applicable in the 835integerconstant
type first in listcontext of a conditional inclusion directive.
se-Commentary
5
Commentary
The C committee recognized that developers may choose to perform different phases of translation on
different hosts For instance, source files may be preprocessed and then distributed for further translation on
other, different, hosts.
Common Implementations
Differences between the numeric values in these two cases is rare (although cases involving Ascii and
Coding Guidelines
Making use of the numeric value of character constants is making use of representation information, which is
covered by a guideline recommendation However, there are cases where deviations may occur.
569.1tation in-formation
represen-using 569.1represen-tation in-formation
Trang 2#ifndef
# ifdef identifier new-line group opt
# ifndef identifier new-line group opt
check whether the identifier is or is not currently defined as a macro name.
operator However, there does not appear to be a worthwhile cost/benefit to recommending one of the possibilities.
1886 142) Thus on an implementation where INT_MAX is 0x7FFF and UINT_MAX is 0xFFFF, the constant 0x8000
is signed and positive within a #if expression even though it is unsigned in translation phase 7.
Trang 36.10.1 Conditional inclusion 1890
Commentary
The order is from the lowest line number to the highest line number.
Coding Guidelines
It may be possible to obtain some translation time performance advantage (at least for the original developer)
by appropriately ordering the directives Unlike developer behavior withifstatements, developers do not 1739selectionstatement
syntaxusually aim to optimize speed of translation when deciding how to order conditional inclusion directives
(experience suggests that developers often simply append new directive to the end of any existing directives).
Recognizing a known pattern in a sequence of directives has several benefits for readers They can make
use of any previous deductions they have made on how to interpret the directives and what they represent,
and the usage highlights common dependencies in the source In the following code fragment more reader
effort is required to spot similarities in the sequence that directives are checked than if both sequences of
directives had occurred in the same order.
Given the lack of attention from developers on the relative ordering of directives and the benefits of using
the same ordering, where possible, a guideline recommendation appears worthwhile However, a guideline
recommendation needs to be automatically enforceable and determining when two sequences of directives 0guideline rec-ommendation
enforceablehave the same affect, during translation, may be infeasible because information that is not contained within
the source may be required (e.g., dependencies between macro names that are likely to be defined via
translator command line options).
Where possible the visual order of evaluation of expressions within different sequences of nested
conditional inclusion directives shall be the same.
name that determines the directive in order to keep track of the level of nested conditionals;
Commentary
A parallel can be drawn with the behavior ofifstatements, in that if their controlling expression evaluates to1744if statementoperand compare
against 0zero, during program execution, any statements in the associated block are skipped.
processingwhile skipping
of nested conditionals;
Commentary
The preprocessor operates on a representation of the source written by the developer, not translated machine
code As such it needs to perform some processing on its input to be able to deduce when to stop skipping.
Trang 4Figure 1889.1: Number of top-level source files (i.e., the contents of any included files are not counted) and (right) complete
translation of this book’s benchmark programs
Directives need to be processed to keep track of the level of nesting of conditionals and translation phases 1–3 still need to be performed (line splicing could affect what is or is not the start of a line) and characterstransla-
tion phase
1
116
within a comment must not be treated as directives.
The intent of only requiring a minimum of directive processing, while skipping, is to enable partially written source code to be skipped and to allow preprocessors to optimize their performance in this special case, speeding up the rate at which the input is processed.
Trang 56.10.2 Source file inclusion 1896
Example
In the following the#definedirective is not well formed But because this group is being skipped the
translator is required to ignore this fact.
This group is processed exactly as-if it appeared in the source outside of any group.
processed;
Commentary
A semantic rule to associate#elsewith the lexically nearest preceding#if(or similar form) directive, like
the one given forifstatements, is not needed because conditional inclusion is terminated by a#endif1747elsebinds to
near-est ifdirective.
Like the matching#if(or similar form) directive case, all preprocessing tokens in the group are treated as
if they appeared outside of any conditional inclusion directive Processing continues until the first#endifis
encountered (which must match the opening directive).
Coding Guidelines
The arguments made forifstatements always containing anelsearm might be thought to also apply to1745elseconditional inclusion However, the presence of a matching#endifdirective reduces the likelihood that
readers will confuse which preprocessing directive any#elseassociates with (although other issues, such
as lack of indentation or a large number of source lines between directives can make it difficult to visually
associate matching directives).
during preprocessing (but not in a group that is skipped) Also there is no requirement that the spelling of
the header in the C source file be represented by a source file of the same spelling The C Standard has no
explicit knowledge of file systems and is silent on the issue of directory structures Minimum required limits
on the implementation processing of a header name are specified elsewhere. 1909#includemapping to host
fileFailure to locate a header or source file that can be processed by the implementation (e.g., a file of the
specified name does not exist, at least along the places searched) is a constraint violation.
Trang 66.10.2 Source file inclusion
1896
Other Languages
Most languages do not specify a#includemechanism, although many of their implementations provide one The approach commonly used by C implementations is popular, but not universal Some languages explicitly state that a#includedirective denotes a file of the given name in the translators host environment.
Common Implementations
For most implementations the header name maps to a file name of the same spelling It is quite common for the translation environment to ignore the case of alphabetic letters (e.g., MS-DOS and early versions of Microsoft Windows), or to limit the number of significant characters in the file name denoted by a header name (the remaining characters being ignored) Use of the/character in specifying a full path to a file is sufficiently common usage that even host environments where this character is not normally associated with
a directory separator support such usage in header names (many Microsoft windows translators support this character, as well as the\character, as a directory separator).
In the majority of implementations#includedirectives specify files containing source in text form.source file
representation121
However, some implementations support what are known as precompiled headers.
header
precompiled121
It is not uncommon (over 10% of#includes in Figure1896.1 ) for the same header to be#included
more than once when translating a source file (it is a requirement that implementations support this usage for standard headers) The following are some of the techniques implementations use to reduce the overhead of subsequent#includes.
• A common convention is to bracket the contents of a header, starting with the preprocessing token sequence#ifndef _ _H_file_name_ _/#define _ _H_file_name_ _and ending with#endif The processing of subsequent#includes of the same header is then reduced to the minimal processing
needed to skip to the matching#endif Some implementations (e.g.,gcc) go one step further and detect headers that contain such bracketing the first time they are processed, and completely skips opening and processing the header if it is subsequently encountered again in a#includedirective.
• Support the preprocessing directive#import.[359]This directive is equivalent to the#includedirective except that if the specified header has already been included it is not included again.
Coding Guidelines
Some coding guideline documents recommend that implementation supplied headers appear before developer written headers, in a source file Such recommendations overlook the possibility that a developer written header might itself#includean implementation header.
denote all headers (i.e., all systems headers are counted), triangles denote all headers delimited by quotes (i.e., likely to be user
form of this book’s benchmark programs
Trang 76.10.2 Source file inclusion 1897
Unnecessary headers #include’d
110100
#includes
110100
1,000
<header>
"header"
Experience suggests that once a#includedirective appears in a source file it is rarely removed (see
Figure 1896.2 ) and that new#includedirectives are simply added after the last one The issue of redundant
codeThere does not appear to be a worthwhile benefit in ordering#includedirectives in any way (apart from
any relative ordering dictated by dependencies between headers).
Trang 86.10.2 Source file inclusion
1897
Rank
1101001,000
"header-name"gives respective an exponent of -2.26, xmin= 8, and -1.8, xmin= 9 Based on the visible form of the.cfiles
searches a sequence of implementation-defined places for a header identified uniquely by the specified
of the header.
Commentary
File systems invariably provide a unique method of identifying every file they contain (e.g., a full path name) The base document recognized the disadvantages of requiring that the full path name be specified in each#includedirective and permitted a substring of it to be given The implementation-defined places are usually additional character sequences (e.g., directory names) added to theh-char-sequencein an attemptheader name
syntax918
to create a full path name that refers to an existing file.
Standard intends that the rules which are eventually provided by the implementor correspond as closely as possible to the original K&R rules The primary reason that explicit rules were not included in the Standard
is the infeasibility of describing a portable file system structure It was considered unacceptable to include UNIX-like directory rules due to significant differences between this structure and other popular commercial file system structures.
within an included file entails a search for the named file relative to the file system directory that holds the
to the same current directory The C89 Committee decided in principle in favor of K&R approach, but was unable to provide explicit search rules as explained above.
is often different from that used for the form"q-char-sequence" For instance, in the<h-char-sequence>
case the contents of/usr/includemight be searched first, followed by the contents of the directory taining the.cfile, while in"q-char-sequence"case the contents of the directory containing the.cfile might be searched first, followed by other places.
Trang 9con-6.10.2 Source file inclusion 1897
The environment in which a translator executes may also affect the sequence of places that are searched.
For instance, the affect of relative path names (e.g., /proj/abc.h) on the identity of the current directory.
gccsearches two directories,/usr/includeand another directory that holds very machine specific files,
such asstdarg.h(e.g.,/usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/includeon your
au-thors computer).gccsupports the#include_nextdirective This directive causes the search algorithm to
skip some of the initial implementation-defined places that would normally be searched The initial places
that are skipped are those that were searched in locating the file containing the#include_nextdirective
(including the place where the search succeeded).
Tzerpos and Holt[1416]describe a well-formedness theory of header inclusion that enables unnecessary
#includedirectives to be deduced.
Coding Guidelines
The standard does not specify the order in which the implementation-defined places are searched This is a
potential coding guideline issue because it is possible that ah-char-sequencewill match in more than one
of the places (i.e., there is a file having the same name along several of the different possible search paths).
The behavior is thus dependent (i.e., it is assumed that the contents of the different headers will be different)
on the order in which the places are searched.
Experience suggests that the affect of a translator locating an#included file different from the one
expected to be located by the developer has one of two consequences— (1) when the contents of the file
accessed is similar to the one intended (e.g., a different version of the intended file) the source file may be
successfully translated, and (2) when the contents of the file accessed has no connection with the intended
file the source is rarely successfully translated The problem might therefore be considered to be one of
version management, rather than the choice of characters used in ah-char-sequence There are a number
of reasons why a solution to this issue is to not useh-char-sequences at all, including the following:
• For the< >delimited form, implementations usually look in a predefined location first (as described in
the Common implementation section above and in the following C sentence). 1898#includeplaces to search
forEnsuring that the names chosen by developers for the headers they create are different from those of
system headers is an almost impossible task While it might be possible to enumerate the set of names
of existing file names of system headers contained in commercially important environments, members
are likely to be added to this set on a regular basis.
Rather than trying to avoid using file names likely to match those of system headers, developers could
ensure that places containing system headers are searched last.
• The< >delimited form is often considered to denote externally supplied headers (e.g., provided by
the implementation or translator environment vendor) What constitutes a system supplied header is
open to interpretation One distinction that can be made between system and developer headers is that
developers do not control of the contents of system headers Consequently, it can be argued that their
contents are not subject to coding guidelines.
Headers whose contents have been written by developers are subject to coding guidelines The
convention generally adopted to indicate this status is to use the double-quote character delimit form
of#include.
Developers sometimes specify full path names in headers (see Table 1896.1 ) This is a configuration
management issue and is not considered to be within the scope these coding guidelines.
Trang 106.10.2 Source file inclusion
1899
translation environments Information was automatically extracted and represents an approximate lower bound Versions of thetranslation environments from approximately the same year (mid 1990s) were used The counts for ISO C assumes that theminimum set of required identifiers are declared and excludes the type generic macros
1898 How the places are specified or the header identified is implementation-defined.
Implementations invariably search one or more predefined locations first (e.g.,/usr/include), followed
by a list of alternative places A number of techniques are used to allow developers to specify a list of alternative places to be searched for files corresponding to the headers specified in a#includedirective For instance, the alternative places may be specified via a translator command line option (e.g.,-I), in a translator configuration file (e.g.,gccversion 2.91.66 hosted on RedHat Linux reads many default locations from the
is still hard coded in the translator sources), or an environment variable (e.g., several Microsoft windows based translators useINCLUDE).
The directory separator used in Unix and MS-DOS slants in different directions Many implementations,
in both environments, recognize both characters as directory delimiters One consequence of this is that escape sequences are not recognized as such (something that is unlikely to be a problem in header names) The RISCOS environment does not support filenames ending in.h The implementation-defined behavior for this host is to look in a directory calledh, for a file of the given name with the.hremoved.
Coding Guidelines
The implementation-defined behavior associated with how the places are specified occurs outside of the source code and is the remit of configuration management guidelines For this reason nothing further is said here.
1899
A preprocessing directive of the form
#include
q-char-sequence
# include "q-char-sequence" new-line
causes the replacement of that directive by the entire contents of the source file identified by the specified
Commentary
The commonly accepted intent of this form of the#includedirective is that it is used to reference source files created by developers (i.e., headers that are not provided as part of the implementation or host environment) The only syntactic difference betweenq-char-sequenceandh-char-sequenceis that neither sequence may contain their respective delimiters.
header name
syntax918
Most q-char-sequences end with one of two character sequences (i.e., c or.h) The character sequences before these suffixes is often called the header name.
Trang 116.10.2 Source file inclusion 1901
Other Languages
The use of double-quote as the delimiter is the almost universal form used in other languages (although some
use the’character because that is what is used to delimit string literals).
Coding Guidelines
The term commonly used to refer to these source files is header The context of the conversation often being
used to distinguish any other intended usage The intent is that the contents of these source files is controlled
by developers and as such they are subject to coding guidelines.
Commentary
While this “implementation-defined manner” might be the same as that for the< >delimited form The intent
is for it to be sufficiently different that developers do not need to be concerned about the name of a header
created by them matching one provided as part of the implementation (and therefore potentially found by the
translator when searching for a matching header) For instance, your author does not know the names of
most of the 304 files (e.g.,compface.h) contained in/usr/includeon his software development computer.
h-char-sequence
Common Implementations
The search algorithm used invariably differs from that used for the< >delimited form (otherwise there would
be little point in distinguishing the two cases) The search algorithm used by some implementations is to
first look in the directory containing the source file currently being translated (which may itself have been
included) If that search fails, and the current source file has itself been included, the directory containing the
source file that#includeit is then searched This process continuing back through any nested#include
directives For instance, in:
(assuming the translation environment supports the path names used), translating the source filefile_3.c
causesfile_2.cto be included, which in turn includesfile_3.c The source fileabc.hwill be searched
for in the directories/foo,/another/pathand then the directory containingfile_3.c.
Some implementations use the double-quote delimited form within their system headers, to change the
default first location that is searched For instance, a third-party API may contain the headerabc.h, which
in turn needs to includeayx.h Using the form"ayx.h"means that the implementation will search in the
directory containingabc.hfirst, not/usr/include This usage can help localize the files that belong to
specific APIs Other implementations use a search algorithm that starts with the directory containing the
original source file being translated.
If the source file is not found after these places have been searched, some implementations then search
other places specified via any translator options Other implementations simply follow the behavior described 1898#includeplaces to search
for
by the following C sentence (which has the consequence of eventually checking these other places).
# include <h-char-sequence> new-line
Trang 126.10.2 Source file inclusion
1908
Commentary
The previous search can fail in the sense that it does not find a matching source file.
Some existing code uses the double-quote delimited form of#include directive to include headers provided by the implementation (rather than the< >delimited form) This requirement ensures that such code continues to be conforming.
A preprocessing directive of the form
# include pp-tokens new-line
(that does not match one of the two previous forms) is permitted.
Trang 136.10.2 Source file inclusion 1909
Commentary
This implementation-defined behavior may take a number of forms, including:
• The##operator can be used to glue preprocessing tokens together However, the behavior is undefined1958##
operator
if the resulting character sequence is not a valid preprocessing token For instance, the five preprocess-1963##
if result not validing tokens {{} {string} {.} {h} {}} cannot be glued together to form a valid preprocessing token
without going through intermediate stages whose behavior is undefined.
• Creating a preprocessing token, via macro expansion, having the double-quote delimited form (i.e., a
string preprocessing token) need not depend on any implementation-defined behavior The stringize
does the implementation strip off the space character at the ends of the delimited character sequence?
mapping
to host file
Commentary
This C sentence and the following ones in this C paragraph are a specification of the minimum set of
requirements that an implementation must meet For sequences outside of this set the implementation mapping
may be non-unique (like, for instance, the Microsoft Windows technique of mapping files ending in.htmlto
.htm) The handling of character sequences that resemble UCNs may also differ, e.g.,"\ubada\file.txt"
(Ubada is a city in Tanzania and BADA is the Hangul symbol ᄇ ᅮ ᇁ in ISO 10646) The standard does not
permit any number of period characters because many operating systems do not permit them (at least one,
RISCOS, does not permit any).
The wording was changed by the response to DR #302 to extend the specification to be more consistent
with C++.
C++
16.2p5
Trang 146.10.2 Source file inclusion
1911
The implementation provides unique mappings for sequences consisting of one or more nondigits (2.10) followed
by a period (.) and a single nondigit.
Other Languages
Other languages either specified to operate within the same operating systems and file systems limitations as
C and as such have to deal with the same issues, or require an integrated development environment to be created before they can be used.
Common Implementations
Implementations invariably pass the sequence of characters that appear between the delimiters (when searching other places a directory path may be added) as an argument in a call tofopenor equivalent system function The called library function will eventually call some host operating system function that interfaces
to the host file system The C translator’s behavior is thus controlled by the characteristics of the host file system and how it maps character sequences to file names The handling of the period character varies between file systems, known behaviors include:
• Unix based file systems permit more than one period in a file name.
• MS-DOS based file systems only permit a single period in a file name.
• RISCOS, an operating system for the Acorn ARM processor does not support filenames that contain
a period For this host file names, that contained a period, specified in a#includedirective were mapped using a directory structure All file names ending in the characters.hwere searched for in a directory calledh.
Coding Guidelines
Because an implementation is not required to provide a unique mapping for all sequences it is possible that
an unintended header or source file will be accessed, or the translator will fail to identify a known header or source file The possible consequences of an unintended access are discussed elsewhere, while failure to
with using character sequences having a unique mapping in the different environments that the source may
be translated in is outside the scope of these coding guidelines.
1910 The first character shall be a letter not be a digit.
significant
Trang 156.10.2 Source file inclusion 1914
Commentary
These permissions reflect known characteristics of file systems in which translators are executed.
C90
The limit specified by the C90 Standard was six significant characters However, implementations invariably
used the number of significant characters available in the host file system (i.e., they do not artificially limit the
number of significant characters) It is unlikely that a header of source file will fail to be identified because
of a difference in what used to be a non-significant character.
C++
The C++ Standard does not give implementations any permissions to restrict the number of significant
characters before the period (16.1p5) However, the limits of the file system used during translation are likely
to be the same for both C and C++implementations and consequently no difference is listed here.
Common Implementations
All file systems place some limits on the number of characters in a source file name— for instance:
• Most versions of the Microsoft DOS environment ignore the distinction of alphabetic case and restrict
the mapping to eight significant characters before any period (and a maximum of three after it).
• POSIX requires that at least 14 characters be significant in a file name (it also requires implementations
to support at least 255 characters in a pathname) Many Linux file systems support up to 255 characters
in a filename and 4095 characters in a pathname.
Coding Guidelines
The potential problems associated with limits on sequences characters that are likely to be treated as unique
is a configuration management issue that is outside the scope of these coding guidelines.
1912 A#includepreprocessing directive may appear in a source file that has been read because of a#include
directive in another file, up to an implementation-defined nesting limit (see 5.2.4.1).
Commentary
Thus#includedirectives can be nested within source files whose contents have themselves been#included.
This issue is discussed elsewhere While this permission only applies to source files, an implementation295limit
#include ingusing some form of precompiled headers (which are not source files within the standard’s definition of the 121header
nest-precompiledterm) that did not support this functionality would not be popular with developers. 108source files
#include <stdio.h>
#include "myprog.h"
Other Languages
Some languages only have a single form of#includedirective for all headers.
Trang 166.10.3 Macro replacement
1919
Commentary
This example does not illustrate any benefit compared to that obtained from placing separate#include
directives in each arm of the conditional inclusion directive.
1915
Forward references: macro replacement (6.10.3).
1916 145) Note that adjacent string literals are not concatenated into a single string literal (see the translationfootnote
Commentary
This is actually a definition in a Constraints clause (it is used by two constraints in this C subsection).
The check against same spelling only needs to take into account the significant characters of an identifier.
• Interfere with existing code as little as possible.
• Keep the preprocessing model simple and uniform.
• Allow macros to be used wherever functions can be.
• Define macro expansion such that it produces the same token sequence whether the macro calls appear in open text, in macro arguments, or in macro definitions.
Preprocessing is specified in such a way that it can be implemented either as a separate text-to-text prepass
or as a token-oriented portion of the compiler itself Thus, the preprocessing grammar is specified in terms of tokens.
Trang 176.10.3 Macro replacement 1921
Commentary
There was an existing body of code, containing redefinitions of the same macro, when the C Standard
was first written The C committee did not want to specify that existing code containing such usage was
non-conforming, but they did consider the case where the bodies of any subsequent definitions differed to be
Any subsequent#undefof the macro name popping this stacked definition and to make it the current one.
Coding Guidelines
C permits more than one definition of the same macro name, with the same body, and more than one external
definition of the same object, with the same type and the coding guideline issues are the same for both (in 420linkage
422.1identifier
declared in one fileboth cases translators are not always required to issue a diagnostic if the definitions are considered to be
different).
In both cases a technique for avoiding duplicate definitions, during translation but not in the visible source,
is to bracket definitions with#ifndef MACRO_NAME/#endif(in the case of the file scope object a macro
name needs to be created and associated with its declaration) Using this technique has the disadvantage that
it prevents the translator checking that any subsequent redeclarations of an identifier are the same (unless the
bracketing occurs around the only textual declaration that occurs in any source file used to build a program).
macro redefinitionpreprocessing directive unless the second definition is a function-like macro definition that has the same
number and spelling of parameters, and the two replacement lists are identical.
Commentary
The issues are the same as for object-like macros, with the addition of checks on the parameters Requiring1919object-like
macro redefinitionthat the parameters be spelled the same, rather than, for instance, that they have an identical effect, simplifies
the similarity checking of two macro bodies For instance, in:
a translator is not required to deduce that the two definitions ofFMare structurally identical.
macro.
Commentary
In the following (assuming $is a member of the extended character set and permitted in an identifier216extended
character setpreprocessing token):
Trang 186.10.3 Macro replacement
1922
Correction Add to subclause 6.8, page 86 (Constraints):
In the definition of an object-like macro, if the first character of a replacement list is not a character required by subclause 5.2.1, then there shall be white-space separation between the identifier and the replacement list.*
[Footnote *: This allows an implementation to choose to interpret the directive:
#define THIS$AND$THAT(a, b) ((a) + (b))
makes, it must also issue a diagnostic.]
However, the complex interaction between this specification and UCNs was debated during the C9X review process and it was decided to simplify the requirements to the current C99 form.
If (before argument substitution) any argument consists of no preprocessing tokens, the behavior is undefined.
The behavior of the following was discussed in DR #003q3, DR #153, and raised against C99 in DR #259 (no committee response was felt necessary).
3
What was undefined behavior in C90 (an empty argument) is now explicitly supported in C99 The two most likely C90 translator undefined behaviors are either to support them (existing source developed using such a translator will may contain empty arguments in a macro invocation), or to issue a diagnostic (existing source developed using such a translator will not contain any empty arguments in a macro invocation).
C++
The C++Standard contains the same wording as the C90 Standard.
C++translators are not required to correctly process source containing macro invocations having any empty arguments.
Trang 196.10.3 Macro replacement 1925
Common Implementations
Some C90 implementations (e.g.,gcc) treated empty arguments as an argument containing no preprocessing
tokens, while others (e.g., Microsoft C) treated an empty argument as being a missing argument (i.e., a
when the trailing arguments are included in a list of arguments to another macro or function For example, if
#define dprintf(format, ) \
dfprintf(stderr, format, VA_ARGS )
and it were allowed for there to be only one argument, then there would be a trailing comma in the expanded
form While some implementations have used various notations or conventions to work around this problem,
the Committee felt it better to avoid the problem altogether.
While some developers may be confused because the requirements on the number of arguments are different
from functions defined using the ellipsis notation, passing too few arguments is a constraint violation (i.e.,
translators are required to issue a diagnostic that a developer then needs to correct).
) terminates it
Commentary
While this requirement is specified in the syntax, it is interpreted as requiring the)preprocessing token to
occur before any macro replacement of the identifiers following the matching(preprocessing token For
the invocation is terminated by the)preprocessing token that occurs immediately before;, not the expanded
form ofR_PAREN.
ellipsis notation in the argumentsparameters.
Trang 206.10.3 Macro replacement
1928
Commentary
This requirement simplifies a translators processing of occurrences of the identifier_ _VA_ARGS_ _.
This typographical correction was made by the response to DR #234.
C90
Support for_ _VA_ARGS_ _is new in C99.
Source code declaring an identifier with the spelling_ _VA_ARGS_ _will cause a C99 translator to issue a diagnostic (the behavior was undefined in C90).
Trang 216.10.3 Macro replacement 1931
one name space
Commentary
Object-like and function-like macro names exist in the same name space However, an identifier defined as
a function-like macro is only treated as such when its name is followed by an opening parenthesis Name 1935function-like macro
followed by (
before/after placement listconsidered part of the replacement list for either form of macro.
re-Commentary
Specifying that such white-space should be considered to part of the replacement list has potential
main-tenance and comprehension costs (it restricts how the start of the replacement list may be indented and
white-space following the replacement list is not immediately visible to readers) for no obvious benefit.
directive could begin, the identifier is not subject to macro replacement.
Commentary
This is a special case of a more general specification given elsewhere.
1867tokens indirective
not expanded unless
Common Implementations
Some preprocessors used to perform this kind of replacement (some past entries in the Obfuscated C
contest[642]relied on such translator behavior).
Example
In the following, even although the identifierdefineis defined as a macro, the line starting#definestill
processed as a macro definition directive, and not as a#undefdirective.
2
object-like
# define identifier replacement-list new-line
the replacement list of preprocessing tokens that constitute the remainder of the directive.
Trang 22The preprocessing tokens in atext-lineare unconditional scanned for instances of macro names topreprocessor
directives
syntax
1854
expand, as are preprocessing tokens in some preprocessing directives.
The standard lists a few restrictions on identifiers that can be defined as macro names The issue ofpredefined
Implementations invariably provide a mechanism that is external to the source code for defining macros, e.g., the-Dcommand line option.
• Parameterizing the definition of a type This issue is discussed in more detail under typedef names.typedef name
endto represent the C punctuators{and}respectively (this existing usage was one reason these macro names were not used as alternative spellings, in<iso646.h>, for these punctuators; it could have rendered
existing conforming code nonconforming), or a developer wanting to modify existing code to use greater floating-point precision might define the macro namefloatto bedouble.
The growth in the usage languages with C-like syntax over the last 10 years means that these days it is rare for developers to attempt to change the visual appearance of C source to be closer to a language they are more familiar with While a macro name that maps to a C token may be surprising to readers of the source, it
is unlikely to conflict with their existing C knowledge, and therefore might be considered at worse a minor inconvenience (i.e., cost).
Defining a macro whose name is the same as a keyword means that the behavior of translated source can
be very different from that expected from its visual appearance (such usage also results in undefined behavior
Trang 236.10.3 Macro replacement 1931
if the definition occurs prior to the inclusion of any library header) The presence of such a definition requires
that readers substitute their existing, default response, knowledge of behavior for a new behavior (assuming
that they had noticed the definition of the macro) Experience suggests that the short-term benefit of defining
and using such macro names is less than the longer term (which may be only a few days) costs associated
with comprehension and miscomprehension of the affected source.
A source file shall not define a macro name to have the spelling of a keyword.
Replacement lists may look innocuous enough when viewed in isolation However, in the context in which
they occur the expanded form may interact in unexpected ways with adjacent tokens For instance, looking at
the components of the following source in isolation:
the appearance of the replacement list ofSUMsuggests thatawill be added toband looking at the use of
SUMin the initialization oflocsuggests that it will be multiplied by the value ofglob However, the token
sequence after macro replacement isglob*a+b, which has a very different interpretation.
The visual appearance of a replacement list containing statements can also be misleading For instance, in:
the assignment todis not dependent on the value ofglob Which is counter to what the visual appearance of
the source suggests.
A general solution to both of these problems is to bracket the replacement list, ensuring that the visually
expected behavior is the same as the behavior that occurs after macro replacement.
A replacement list having the form of an expression containing one or more binary operators shall be
bracketed with parentheses, unless the binary operators are only those included in the production of a
postfix-expr.
A replacement list consisting of more than one statement shall be completely enclosed in a pair of
The visual appearance of declarations can also be deceptive when macro replacements are involved For
instance, in:
Trang 24The bracketing technique cannot be used with a replacement list that represents a type (it would violate
C syntax) However, using a typedef name is not a general solution, it is possible to use macro names in situations where a typedef name cannot be used For instance, in:
2
it is possible to modify the type denoted byX_TYPEbecause the macro expanded form represents a valid integer type when preceded by unsigned However, the type denoted by a typedef name cannot be so
source line
118
a visible form that closely resembles that seen when it appears in other contexts is small The benefit for subsequent readers is the ability to use the same strategies to read source constructs as they use in other contexts.
There are a number of ways in which token sequences appearing in various contexts might visually resemble each other For instance, in the following definitions bothZERO_ARRAY_1andZERO_ARRAY_2
visually associate preprocessing tokens in the macro body, while inZERO_ARRAY_3preprocessing tokens in the macro body visual interacts with the preprocessing tokens in the preprocessing directive.
Trang 25Figure 1931.1: Number of translation units containing a given number of macro names which were macro expanded, excluding
expansions that occurred while processing the contents of system headers Based on the translated form of this book’s benchmark
programs
Table 1931.1: Detailed breakdown of the kinds of replacement lists occurring in macro definitions Adapted from Ernst, Badros,
Table 1931.2: Common macro definitions listed with an abstracted form of their replacement list (as a percentage of all macro
This sentence was added by the response to DR #306 and removes the possibility of a reader interpreting the
1933macro
function-like
function-like
Trang 266.10.3 Macro replacement
1934
# define identifier lparen identifier-list opt ) replacement-list new-line
# define identifier lparen ) replacement-list new-line
# define identifier lparen identifier-list , ) replacement-list new-line
defines a function-like macro with arguments, parameters, whose use is similar syntactically to a function call.
Commentary
This defines the term function-like macro This term is commonly used by developers Function-like macro definitions do not contain any type information Replacement is based solely on matching preprocessing token spellings.
Limits on the number of parameters a function-like macro definition may contain are discussed elsewhere.limit
general form, i.e., they also apply to function-like macro definitions.
The visual appearance of a function-like macro’s replacement list can be misleading in suggesting that an operation is performed on a parameter For instance, in:
Any parameter of a function-like macro appearing as an operand in an expression shall be parenthesized,
Trang 27The visibility of the parameters also extends over the entire replacement list and is not affected by any
identifiers declared within the replacement list (they are simply treated as a sequence of preprocessing tokens
invocation of the macro).
Commentary
No formal syntax is specified for the sequence of preprocessing tokens that form an invocation of a
like macro However, in some contexts the sequence of preprocessing tokens in an invocation of a
function-like macro may result in undefined behavior (e.g., preprocessing tokens having the form of a preprocessing),1940argumentresemble
prepro-cessing directive
or the context in which the invocation occurs may have its own syntax (e.g., preprocessing directives are
ended by
It is possible to suppress the expansion of afunction-likemacro by ensuring that it is not followed by
a(preprocessing token (e.g., by enclosing the macro name in parentheses):
Some implementations provide both function and macro definitions of some library functions Developers
wanting to ensure that the function’s definitions are invoked parenthesize the name of the function to prevent
it being treated as a function-like macro An occurrence of an identifier currently defined as a function-like
macro and not followed by a(preprocessing token could be a fault (in which case a translator diagnostic is
likely to be generated because of a reference to an undeclared identifier), or the same identifier is used to
syntax
intervening matched pairs of left and right parenthesis preprocessing tokens.
Commentary
The syntax of function-like macros does not specify which right parentheses terminates an argument
list Hence the need for this wording Skipping intervening matched pairs of left and right parentheses
preprocessing tokens allows arbitrary expressions, which may containing parentheses, to be passed as
arguments Any preprocessing tokens between the matched parentheses are treated as belonging to the
argument and not part of the syntax of the macro invocation For instance:
Trang 286.10.3 Macro replacement
1939
2
The)preprocessing token is searched for in the source file without performing macro expansion (DR
11 3
12 */
1937 Within the sequence of preprocessing tokens making up an invocation of a function-like macro, new-line is considered a normal white-space character.
arguments
arguments for the function-like macro.
Commentary
This introduces the common usage term arguments to refer to these preprocessing token sequences Limits
on the number of arguments that may appear in an invocation of function-like macro are discussed elsewhere.limit
arguments in
macro invocation
291
1939 The individual arguments within the list are separated by comma preprocessing tokens, but comma prepro-macro
Coding Guidelines
An argument whose evaluation causes a side effect can sometimes result in program behavior that is surprising
to developers (because they failed to take account of the argument being evaluated more than once) For instance, in the following fragment:
2
the objectiis incremented twice There are a number of possible guideline recommendations that prevent these surprises occurring, these include recommending that:
Trang 296.10.3 Macro replacement 1939
• The evaluation of arguments to macros not have side effects Such a recommendation would require
that developers be aware of whether an identifier followed by a left parentheses results in a macro
replacement or a function call At some future time a function call may be replaced by a macro
invocation, which could then require that existing code be changed to ensure that arguments did not
cause side effects (this goes against the aim that adherence to guideline recommendations not require
an amount of effort that is out of proportion to the changes made to existing source).
0guideline ommendation
rec-adherence has a reasonable costSide effect related issues in other language constructs are discusses elsewhere. 1740controlling
expression
if statement
• The expansion of a macro not result in a sequence of tokens that evaluate any of its arguments more
than once Syntactically it is possible to create a replacement list that follows this recommendation.
However, semantically temporary variables of the correct type need to be visible and invoking the
same macro twice in the same full expression is likely to result in undefined behavior.
2
When usinggccthis problem can be solved by making use of two extensions, (1) thetypeofoperator
and (2) using the({ })punctuators to create an expression from a sequence of declarations and
statements For instance:
The costs associated with both of these possible recommendations would be incurred for all function-like
macros, while a benefit would only be obtained for a few uses It would seem that neither of them has
sufficient cost/benefit to make a guideline recommendation worthwhile.
Both of the previous possible recommendations treated the macro definition and its invocation in isolation.
Recommending that an argument causing a side effect not be passed to a macro whose corresponding
parameter is evaluated more than once is equivalent to one recommending that programs not contain faults,
Trang 30possible behaviors include:
• Treating the sequence of preprocessing tokens between the matches parentheses treated as its argument.
• Treating those sequences of preprocessing tokens that have the form of a preprocessing directive as such a directive, i.e., definingMan object-like macro and passing either3or4as the argument to
• Issuing a diagnostic and failing to translate the source file.
Preprocessing directives can occur within the list of arguments in automatically generated, or processed,
code For instance, an expression original written by a developer may be expanded or split over several lines
(source is often emailed and some email programs have a lower limit on the number of characters on a line
than the C Standard).
Suppose line number 123 of a source file contained
a tool that converted C source into a form suitable for emailing might convert this to one of several forms:
• Splitting long lines is the simplest approach:
presence of line splices.
• Splitting long lines and adding#linedirectives so that any diagnostic messages can be related back to the original source might be more developer friendly:
Trang 316.10.3 Macro replacement 1943
However, ifdgayis defined as a macro the behavior will be undefined.
Rationale
Committee decided to not allow any preprocessor directives to be recognized as such inside of macros.
This C specification covers preprocessing directives, not preprocessing operators (such asdefined, which is
not #definedThis footnote was added by the response to DR #250.
comma preprocessing tokens, are merged to form a single item: the variable arguments.
Commentary
This defines the term variable arguments The same term is also used to refer to the arguments corresponding
to the ellipsis notation in a function definition It would be more exact for the specification to say “after the”
rather than “in the”.
The C preprocessor model of macro expansion is one of performing (potentially recursive) token sub- macro re-placementstitution, not of interpreting sequences of commands (e.g., there is no method of iterating) This model
has no existing framework for walking through a list of variable arguments, like statements in a function
definition Without completely rewriting the preprocessor specification there is little scope for anything other
than solution adopted by the C Committee Because all of the variable arguments are formed into a single
item they be used in a context that treats them as a single sequence of preprocessing tokens.
These guideline recommendations are driven by common developer behaviors in dealing with constructs.
This construct is new in C99 and as yet no significant experience has been gained about how developers
interact with it The specification of behavior is sufficiently different from the use of ellipsis in function
prototype definitions that drawing parallels, with the aim of framing an applicable guideline recommendation,1984variable macroEXAMPLE
argumentsdoes not look possible.
Commentary
This argument specification differs from that for function definitions, in that for macros at least one argument
is required to match the notation If it is possible that a single argument may be passed then the definition
of the macro needs to include as the only parameter in its definition.
The function-like macroONErequires at least one argument and the macroTWOrequires at least two arguments.
Trang 326.10.3.1 Argument substitution
1946
1943 146) Since, by macro-replacement time, all character constants and string literals are preprocessing tokens,footnote
Commentary
The term argument substitution is also commonly used by developers to refer to this process.
How a sequence of preprocessing tokens within a source file are split into the arguments that belong to a particular function-like macro invocation is discussed elsewhere.
Trang 336.10.3.1 Argument substitution 1948
argument macroexpandedpreprocessing token (see below), is replaced by the corresponding argument after all macros contained
therein have been expanded.
Commentary
That is, parameters occurring in the replacement list are replaced after their corresponding arguments have
been macro expanded The#and##operator contexts are the only situations where any macros appearing in
a parameter’s corresponding argument are not considered for replacement In the following the expanded
form ofPARAMis not examined for preprocessing tokens that have the same spelling as a parameter ofF.
3
formed the rest of the preprocessing file;
Commentary
Each argument is expanded in isolation (i.e., there is no interaction between arguments or any other
prepro-cessing tokens in the source file).
Completely expanding the argument requires thatFM1then be expanded (it is followed by an opening
parentheses) However, this expansion does not succeed because there is no matching closing parentheses, in
the sequence of preprocessing tokens for that argument The behavior is undefined Continuing on from the
the invocation ofFM3expands to23(the argumentFM1is not expanded further, as an argument, because it is
not followed by an opening parentheses).
Trang 34the variable arguments shall form the preprocessing tokens used to replace it.
Commentary
This is a requirement on the implementation.
The extent to which the arguments corresponding to the parameter_ _VA_ARGS_ _are replaced may be affected by the presence of commas For instance, in:
the argument in the second invocation ofELLIP_FUNCexpands toF(’a’, ’b’)which in turn expands to
from((and other preprocessing tokens).
Usage
Based on the visible form of the.cfiles 0.26% (0.09%.hfiles) of the replacement lists of macro definitions contained a#operator There were no obvious patterns to the usage.
Semantics
Trang 356.10.3.2 The#operator 1952
stringize operand
by a single character string literal preprocessing token that contains the spelling of the preprocessing token
sequence for the corresponding argument.
Commentary
RationaleSome pre-C89 implementations decided to replace identifiers found within a string literal if they matched a
macro argument name The replacement text is a “stringized” form of the actual argument token sequence.
This practice appears to be contrary to K&R’s definition of preprocessing in terms of token sequences The
C89 Committee declined to elaborate the syntax of string literals to the point where this practice could
be condoned; however, since the facility provided by this mechanism seems to be widely used, the C89
Committee introduced a more tractable mechanism of comparable power.
In the following example:
2
the#preprocessing token exists in the expanded replacement list, rather than the replacement list and is
considered to be a punctuator rather than an operator.
Common Implementations
Microsoft C supports the preprocessor operator#@(call the charizing operator) as an extension It converts
its operand to a character constant.
Commentary
Whether multiple white space between preprocessing tokens has already been converted to a single white
space before this conversion is discussed elsewhere Also there is no white space added where none existed 128white-spacesequence replaced
by one
in the source file.
RationaleOne problem with defining the effect of stringizing is the treatment of white space occurring in macro definitions.
Where this could be discarded in the past, now upwards of one logical line may have to be retained As
a compromise between token-based and character-based preprocessing disciplines, the C89 Committee
decided to permit white space to be retained as one bit of information: none or one Arbitrary white space is
replaced in the string by one space character.
Trang 366.10.3.2 The#operator
1955
Coding Guidelines
Any misconceptions about the white space will appear between preprocessing tokens that have been stringized
is a developer education issue, not a coding guideline issue.
#
escape sequence
character beginning a universal character name.
Commentary
This specification is intended to ensure that the output produced by passing the string produced by the stringize operator as an argument toprintf, for instance, is the same as the visible form (with white-space characters reduced to a single space character) of the preprocessing token sequence immediately prior to being stringized (although this sequence may not exist in the visible source) In the following example:
2
if UCNs are mapped in translation phase 1 and @ is a supported character then the invocation ofmkstr
Trang 37the result of the#operator need not be"a \ b".
Common Implementations
Most implementations simply create a sequence of characters However, processing in subsequent phases of
translation (e.g., conversion of escape sequences) may also result in undefined behavior. 133transla-tion phase
Like C90, the behavior in C++is not explicitly defined (some implementations e.g., Microsoft C++, do not
support empty arguments).
if {1} is glued to {2} and then stringised the resulting preprocessing token is defined However, stringizing
{1} and then attempting to glue it to {2} does not yield a defined preprocessing token (the behavior is
undefined).
When both operators occur in a replacement list, performing token gluing first would appear to give the
highest probability of having a defined result, when a stringize operator is also present However, there is no
requirement that implementations use this order There is no need to specify an evaluation order for the#
operator because it is unary (the evaluation order or the##operator is discussed elsewhere). 1965##
evaluation der
or-Coding Guidelines
An order dependency on the evaluation of the#and##operators only exists when either of them could
be applied to the same preprocessing token in a replacement list Unlike full expression evaluation it is 1712full
expres-sionnot possible to use parentheses to group operands with preprocessor operators These operators have to be
adjacent to the preprocessing tokens they operate on However, this combination of events rarely occurs
(there are no occurrences in the.cfiles) and thus a guideline recommendation is not considered worthwhile.
Trang 385
Trang 39new token by macro argument substitution One pre-C89 implementation replaced a comment within a macro
expansion by no characters instead of the single space called for in K&R The C89 Committee considered
this practice unacceptable.
As with “stringizing,” the facility was considered desirable, but not the extant implementation of this facility,
causes concatenation of the tokens on either side of it into a new composite token.
The specification of this pasting operator is based on these principles:
• Paste operations are explicit in the source.
substituted for the formal parameter; but the actual parameter is not replaced Given, for example
Trang 406.10.3.3 The##operator
1962
• Pasting does not cross macro replacement boundaries.
• The token resulting from a paste operation is subject to further macro expansion.
These principles codify the essential features of prior art and are consistent with the specification of the stringizing operator.
This defines the term placemarker The standard uses placemarker preprocessing tokens to describe an effect,
an implementation need not represent them internally The need for a placemarker preprocessing token occurs because the##operator does not cross replacement boundaries.
1961 For both object-like and function-like macro invocations, before the replacement list is reexamined for more
argument) is deleted and the preceding preprocessing token is concatenated with the following preprocessing token.
# and ##1981 Table 1961.1: Possible results of concatenating, using the##operator, pairs of preprocessing tokens (the one appearing in the left
column followed by the one appearing in the top row) where the result might be defined (undefined denotes undefined behavior)
identifier pp-number punctuator string-literal character-constant
punctuator orundefined